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Preface I 


Dear Reader, 

We would like to thank you very much for studying the proceedings volume 
of the conference “Risk Management Reloaded”, which took place in Garching- 
Hochbriick, during September 9-13, 2013. This conference was organized by the 
KPMG Center of Excellence in Risk Management and the Chair of Mathematical 
Finance at Technische Universitat Miinchen. The scientific committee consisted 
of Prof. Claudia Kliippelberg, Prof. Matthias Scherer, Prof. Wim Schoutens, and 
Prof. Rudi Zagst. Selected speakers were approached to contribute with a manu- 
script to this proceedings volume. We are grateful for the large number of high- 
quality submissions and would like to especially thank the many referees that 
helped to control and even improve the quality of the presented papers. 

The objective of the conference was to bring together leading researchers and 
practitioners from all areas of quantitative risk management to take advantage of the 
presented methodologies and practical applications. With more than 200 registered 
participants (about 40 % practitioners) and 80 presentations we outnumbered our 
own expectations for this inaugural event. The broad variety of topics is also 
reflected in the long list of keynote speakers and their presentations: Prof. Hansjorg 
Albrecher (risk management in insurance), Dr. Christian Bluhm (credit-risk mod- 
eling in risk management), Prof. Fabrizio Durante (dependence modeling in risk 
management), Dr. Michael Kemmer (regulatory developments in risk manage- 
ment), Prof. Rudiger Kiesel (model risk for energy markets), Prof. Ralf Korn (new 
mathematical developments in risk management), Prof. Alfred Muller (new risk 
measures), Prof. Wim Schoutens (model, calibration, and parameter risk), and Prof. 
Josef Zechner (risk management in asset management). Besides many invited and 
contributed talks, the conference participants especially enjoyed a vivid panel 
discussion titled “Quo vadis quantitative risk management?” with Dr. Christopher 
Fotz, Dr. Matthias Mayer, Vassilios Pappas, Prof. Fuis Seco, and Dr. Daniel 
Sommer as participants and Markus Zydra serving as anchorman. Moreover, 
we had a special workshop on copulas (organized by Prof. Fabrizio Durante and 
Prof. Matthias Scherer), a DGVFM workshop on “Alternative interest guarantees in 
life insurance” (organized by Prof. Ralf Korn and Prof. Matthias Scherer), 
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a workshop on “Advances in LIBOR modeling” (organized by Prof. Kathrin Glau), 
and a workshop on “Algorithmic differentiation” (organized by Victor Mosenkis 
and Jacques du Toit). Finally, the last day of the conference was dedicated to young 
researchers, serving as a platform to present results from ongoing Ph.D. projects. It 
is clearly worth mentioning, however, that there was enough time reserved for 
social events like a conference dinner at “Braustiiberl Weihenstephan,” a “Night 
watch man tour” in Munich, and a goodbye reception in Garching-Hochbriick. The 
editors of this volume would like to thank again all participants of the conference, 
all speakers, all members of the organizing committee (Kathrin Glau, Bettina Haas, 
Asma Khedher, Mirco Mahlstedt, Matthias Scherer, Anika Schmidt, Thorsten 
Schulz, and Rudi Zagst), all contributors to this volume, the referees, and finally our 
generous sponsor KPMG AG Wirtschaftsprufungsgesellschaft. 


Kathrin Glau 
Matthias Scherer 
Rudi Zagst 



Preface II 


The conference “Risk Management Reloaded” was held on the campus of 
Technische Universitat Miinchen in Garching-Hochbriick (Munich) during 
September 9-13, 2013. Thanks to the great efforts of the organizers, the scientific 
committee, the keynote speakers, contributors, and all other participants, the 
conference was a great success, motivating academics and practitioners to learn and 
discuss within the broad field of financial risk management. 

The conference “Risk Management Reloaded” and this book are part of an 
initiative called KPMG Center of Excellence in Risk Management that was founded 
in 2012 as a very promising cooperation between the Chair of Mathematical 
Finance at the Technische Universitat Miinchen and KPMG AG Wirt- 
schaftspriifungsgesellschaft. This collaboration aims at bringing together practi- 
tioners from the financial industry in the areas of trading, treasury, financial 
engineering, risk management, and risk controlling, with academic researchers in 
order to supply trendsetting and realizable improvements in the effective manage- 
ment of financial risks. It is based on three pillars, consisting of the further 
development of a practical and scientifically challenging education of students, the 
support of research with particular focus on young researchers as well as the 
encouragement of exchange within the scientific community and between science 
and the financial industry. 

The topic of financial risk management is a subject of great importance for 
banks, insurance companies, asset managers, and the treasury departments of 
industrial corporations that are exposed to financial risk. It has been of even greater 
attention ever since the financial crisis in 2008. Though regulatory focus rose and 
the requirements on internal risk models have become more pronounced and 
comprehensive, confidence in risk models and the financial industry itself has been 
damaged to some extent. We intended to discuss several questions concerning these 
doubts, for example, whether we need more or fewer quantitative risk models, and 
how to adequately use and manage risk models. We think that quantitative risk 
models are an important tool to understand and manage the risks of what continues 
to be a complex business. However, comprehensive regulation for internal models 
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is necessary. It is important that models can be explained to internal and external 
stakeholders and are used in a suitable way. 

The campus of the university in Garching-Hochbriick was a great place for the 
conference. The 200 participants, 55 % of whom were academics, 40 % practi- 
tioners, and 5 % students, had many fruitful discussions and exchanges during five 
days of workshops, talks, and great social events. Participants came from more than 
20 countries, which made the conference truly international. Due to the broadness 
of the main theme and the many different backgrounds of the participants, the topics 
presented during the conference covered a large spectrum, ranging from regulatory 
developments to theoretical advances in financial mathematics and including 
speakers from both academia and the industry. 

The first day of the conference was dedicated to workshops on copulas, algo- 
rithmic differentiation, guaranteed interest payments in life insurance contracts, and 
LIBOR modeling. During the following days, several keynote speeches and con- 
tributed talks treated various aspects of risk management, including market specific 
(insurance, credit, energy) challenges, and tailor-made methods (model building, 
calibration). The panel discussion on Wednesday brought together the views of 
prestigious representatives from academia, industry, and regulation on the neces- 
sity, reasonableness, and limitations of quantitative risk methods for the measure- 
ment and evaluation of risk. The conference was completed by a “Young 
Researchers Day” giving junior researchers the opportunity to present and discuss 
their results in front of a broad audience. 

We would like to thank all the participants of the conference for making this 
event a great success. In particular, we express our gratitude to the scientific 
committee, namely Claudia Kluppelberg, Matthias Scherer, Wim Schoutens, and 
Rudi Zagst, the organizational team, namely Kathrin Glau, Bettina Haas, Asma 
Khedher, Mirco Mahlstedt, Matthias Scherer, Anika Schmidt, Thorsten Schulz, and 
Rudi Zagst, the keynote speakers, the participants of the panel discussion, namely 
Christopher Lotz, Luis Seco, and Vasilios Pappas, all speakers within the work- 
shops, contributed talks, and the young researchers day, and, last but not least, all 
participants that attended the conference. 


Dr. Matthias Mayer 
KPMG AG Wirtschaftspmfungsgesellschaft 


Dr. Daniel Sommer 
KPMG AG Wirtschaftspmfungsgesellschaft 
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Part I 

Markets, Regulation, 
and Model Risk 



A Random Holding Period Approach 
for Liquidity-Inclusive Risk Management 


Damiano Brigo and Claudio Nordio 


Abstract Within the context of risk integration, we introduce risk measurement 
stochastic holding period (SHP) models. This is done in order to obtain a ‘liquidity- 
adjusted risk measure’ characterized by the absence of a fixed time horizon. The 
underlying assumption is that — due to changes in market liquidity conditions — one 
operates along an ‘operational time’ to which the P&L process of liquidating a market 
portfolio is referred. This framework leads to a mixture of distributions for the port- 
folio returns, potentially allowing for skewness, heavy tails, and extreme scenarios. 
We analyze the impact of possible distributional choices for the SHP. In a multivari- 
ate setting, we hint at the possible introduction of dependent SHP processes, which 
potentially lead to nonlinear dependence among the P&L processes and therefore 
to tail dependence across assets in the portfolio, although this may require dras- 
tic choices on the SHP distributions. We also find that increasing dependence as 
measured by Kendall’s tau through common SHPs appears to be unfeasible. We 
finally discuss potential developments following future availability of market data. 
This chapter is a refined version of the original working paper by Brigo and Nordio 
(2010) [14]. 


1 Introduction 


According to the Interaction between Market and Credit Risk (IMCR) research group 
of the Basel Committee on Banking Supervision (BCBS) [5], liquidity conditions 
interact with market risk and credit risk through the horizon over which assets can 
he liquidated. To face the impact of market liquidity risk, risk managers agree in 
adopting a longer holding period to calculate the market VaR, for instance 10 business 
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days instead of 1; recently, BCBS has prudentially stretched such liquidity horizon 
to 3 months [6]. However, even the IMCR group pointed out that the liquidity of 
traded products can vary substantially over time and in unpredictable ways , and 
moreover, IMCR studies suggest that banks’ exposures to market risk and credit 
risk vary with liquidity conditions in the market. The former statement suggests a 
stochastic description of the time horizon over which a portfolio can be liquidated, 
and the latter highlights a dependence issue. 

We can start by saying that probably the holding period of a risky portfolio is 
neither 10 business days nor 3 months; it could, for instance, be 10 business days 
with probability 99% and 3 months with probability 1%. This is a very simple 
assumption but it may have already interesting consequences. Indeed, given the FSA 
(now Bank of England) requirement to justify liquidity horizon assumptions for the 
Incremental Risk Charge modeling, a simple example with the two-points liquidity 
horizon distribution that we develop below could be interpreted as a mixture of 
the distribution under normal conditions and of the distribution under stressed and 
rare conditions. In the following we will assume no transaction costs, in order to 
fully represent the liquidity risk through the holding period variability. Indeed, if 
we introduce a process describing the dynamics of such liquidity conditions, for 
instance, 

• the process of time horizons over which the risky portfolio can be fully bought or 
liquidated, 

then the P&L is better defined by the returns calculated over such stochastic time 
horizons instead of a fixed horizon (say daily, weekly or monthly basis). We will 
use the “stochastic holding period” (SHP) acronym for that process, which belongs 
to the class of positive processes largely used in mathematical finance. We define 
the liquidity-adjusted VaR or Expexted Shortfall (ES) of a risky portfolio as the VaR 
or ES of portfolio returns calculated over the horizon defined by the SHP process, 
which is the ‘operational time’ along which the portfolio manager must operate, in 
contrast to the ‘calendar time’ over which the risk manager usually measures VaR. 


1.1 Earlier Literature 

Earlier literature on extending risk measures to liquidity includes several studies. 
Jarrow and Subramanian [17], Bangia et al. [4], Angelidis and Benos [3], Jarrow and 
Protter [18], Stange and Kaserer [25], Ernst, Stange and Kaserer [15], among others, 
propose different methods of extending risk measures to account for liquidity risk. 
Bangia et al. [4] classify market liquidity risk into two categories: (a) the exogenous 
illiquidity that depends on general market conditions is common to all market players 
and is unaffected by the actions of any one participant and (b) the endogenous 
illiquidity, which is specific to one’s position in the market, varies across different 
market players and is mainly related to the impact of the trade size on the bid-ask 
spread. Bangia et al. [4] and Ernst et al. [15] only consider the exogenous illiquidity 
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risk and propose a liquidity adjusted VaR measure built using the distribution of the 
bid-ask spreads. The other mentioned studies model and account for endogenous risk 
in the calculation of liquidity adjusted risk measures. In the context of the coherent 
risk measures literature, the general axioms a liquidity measure should satisfy are 
discussed in [1]. In that work coherent risk measures are defined on the vector space 
of portfolios (rather than on portfolio values). A key observation is that the portfolio 
value can be a nonlinear map on the space of portfolios, motivating the introduction 
of a nonlinear value function depending on a notion of liquidity policy based on a 
general description of the micro structure of illiquid markets. 

As mentioned earlier, bid-ask spreads have been used to assess liquidity risk. 
While bid-ask spreads are certainly an important measure of liquidity, they are not 
the only one. In the Credit Default Swap (CDS) space, for example, Predescu et al. 
[22] have built a statistical model that associates an ordinal liquidity score with 
each CDS reference entity. The liquidity score is built using well-known liquidity 
indicators such as the already mentioned bid-ask spreads but also using other less 
accessible predictors of market liquidity such as number of active dealers quoting 
a reference entity, staleness of quotes of individual dealers, and dispersion in mid- 
quotes across market dealers. The bid-ask spread is used essentially as an indicator of 
market breadth; the presence of orders on both sides of the trading book corresponds 
to tighter bid-ask spreads. Dispersion of mid-quotes across dealers is a measure 
of price uncertainty about the actual CDS price. Less liquid names are generally 
associated with more price uncertainty and thus large dispersion. The third liquidity 
measure that is used in Predescu et al. [22] aggregates the number of active dealers and 
the individual dealers’ quote staleness into an (in)activity measure, which is meant 
to be a proxy for CDS market depth. Illiquidity increases if any of the liquidity 
predictors increases, keeping everything else constant. Therefore, liquid (less liquid) 
names are associated with smaller (larger) liquidity scores. CDS liquidity scores are 
now offered commercially by Fitch Solutions and as of 2009 provided a comparison 
of relative liquidity of over 2,400 reference entities in the CDS market globally, 
mainly concentrated in North America, Europe, and Asia. The model estimation and 
the model generated liquidity scores are based upon the Fitch CDS Pricing Service 
database, which includes single-name CDS quotes on over 3,000 entities, corporates, 
and sovereigns across about two dozen broker-dealers back to 2000. This approach 
and the related results, further highlighting the connection between liquidity and 
credit quality/rating, are summarized in [14], who further review previous research 
on liquidity components in the pricing space for CDS. 

Given the above indicators of liquidity risk, the SHP process seems to be naturally 
associated with the staleness/inactivity measure. However, one may argue that the 
random holding period also embeds market impact and bid-ask spreads. Indeed, 
traders will consider closing a position or a portfolio also in terms of cost. If bid- 
ask spreads cause the immediate closure of a position to be too expensive, market 
operators might wait for bid-asks to move. This will impact the holding period for 
the relevant position. If we take for granted that the risk manager will not try to 
model the detailed behavior of traders, then the stochastic holding period becomes a 
reduced form process for the risk manager, which will possibly incapsulate a number 
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of aspects on liquidity risk. Ideally, as our understanding of liquidity risk progresses, 
we can move to a more structural model where the dynamics of the SHP is explained 
in terms of market prices and liquidity proxies, including market impact, bid-ask 
spreads, and asset prices. However, in this work we sketch the features the resulting 
model could have in a reduced form spirit. 

This prompts us to highlight a further feature that we should include in future 
developments of the model introduced here: we should explicitly include dependence 
between price levels and holding periods, since liquidity is certainly related to the 
level of prices in the market. 


1.2 Different Risk Horizons Are Acknowledged by BCBS 

The Basel Committee came out with a recommendation on multiple holding periods 
for different risk factors in 2012 in [7]. This document states that 

The Committee is proposing that varying liquidity horizons be incorporated in the market 
risk metric under the assumption that banks are able to shed their risk at the end of the 
liquidity horizon. [...]. This proposed liquidation approach recognises the dynamic nature of 
banks trading portfolios but, at the same time, it also recognises that not all risks can be 
unwound over a short time period, which was a major flaw of the 1996 framework. 

Further on, in Annex 4, the document details a sketch of a possible solution: assign 
a different liquidity horizon to risk factors of different types. While this is a step 
forward, it can be insufficient. How is one to decide the horizon for each risk factor, 
and especially how is one to combine the different estimates for different horizons 
for assets in the same portfolio into a consistent and logically sound way? Our 
random holding period approach allows one to answer the second question, but more 
generally none of the above works focuses specifically on our setup with random 
holding period, which represents a simple but powerful idea to include liquidity in 
traditional risk measures such as Value at Risk or Expected Shortfall. Our idea was 
first proposed in 2010 in [13]. 

When analyzing multiple positions, holding periods can be taken to be strongly 
dependent, in line with the first classification (a) of Bangia et al. [4] above, or 
independent, so as to fit the second category (b). We will discuss whether adding 
dependent holding periods to different positions can actually add dependence to the 
position returns. 

The paper is organized as follows. In order to illustrate the SHP model, first in 
a univariate case (Sect. 2) and then in a bivariate one (Sect. 3), it is considerably 
easier to focus on examples on (log)normal processes. A brief colloquial hint at 
positive processes is presented in Sect. 2, to deepen the intuition of the impact on 
risk measures of introducing a SHP process. Across Sects. 3 and 4, where we try 
to address the issue of calibration, we outline a possible multivariate model which 
could be adopted, in line of principle, in a top-down approach to risk integration in 
order to include the liquidity risk and its dependence on other risks. 
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Table 1 Simplified discrete 
SHP 


Holding period 

Probability 

10 

0.99 

75 

0.01 


Finally, we point out that this paper is meant as a proposal to open a research 
effort in stochastic holding period models for risk measures. This paper contains 
several suggestions on future developments, depending on an increased availability 
of market data. The core ideas on the SHP framework, however, are presented in this 
opening paper. 


2 The Univariate Case 

Let us suppose that we have to calculate the VaR of a market portfolio whose value 
at time t is V t . We call X t = In V t , so that the log return on the portfolio value at 
time t over a period h is 


X t+h -X t =\n (V t +h/V t ) 


Vt+h - Vt 

Vt 


In order to include liquidity risk, the risk manager decides that a realistic, simplified 
statistics of the holding period in the future will be the one given in Table 1 . To 
estimate liquidity-adjusted VaR say at time 0, the risk manager will perform a number 
of simulations of Vo +h 0 ~ Vo with Ho randomly chosen by the statistics above, and 
finally will calculate the desired risk measure from the resulting distribution. If 
the log-return Xt — Vo is normally distributed with zero mean and variance T for 
deterministic T (e.g., a Brownian motion, i.e., a Random walk), then the risk manager 

could simplify the simulation using Xo+// 0 — Xo\h 0 ~ s/T^O (X\ — Xo) where |// 0 
denotes “conditional on Ho”. With this practical exercise in mind, let us generalize 
this example to a generic t. 


2.1 A Brief Review on the Stochastic Holding Period 
Framework 

A process for the risk horizon at time /. i.e., 1 1 -> H t , is a positive stochastic process 
modeling the risk horizon over time. We have that the risk measure at time t will be 
taken on the change in value of the portfolio over this random horizon. If X t is the 
log- value of the portfolio at time t, we have that the risk measure at time t is to be 
taken on the log-return 


X t + Ht ~ X t . 
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For example, if one uses a 99 % Value at Risk (VaR) measure, this will be the 1st 
percentile of X t +n t ~ X t . The request that H t be just positive means that the horizon 
at future times can both increase and decrease, meaning that liquidity can vary in 
both directions. 

There are a large number of choices for positive processes: one can take lognormal 
processes with or without mean reversion, mean reverting square root processes, 
squared Gaussian processes, all with or without jumps. This allows one to model the 
holding period dynamics as mean reverting or not, continuous or with jumps, and 
with thinner or fatter tails. Other examples are possible, such as Variance Gamma or 
mixture processes, or Levy processes. See for example [11, 12]. 


2.2 Semi-analytic Solutions and Simulations 

Going back to the previous example, let us suppose that 

Assumption 1 The increments X t +\ y — X t are logarithmic returns of an equity 
index, normally distributed with annual mean and standard deviation, respectively, 
fiiy ~ —1.5 % and a\ y = 30%. 

We suppose an exposure of 100 in domestic currency. 

Before running the simulation, we recall some basic notation and formulas. 

The portfolio log-returns under random holding period at time 0 can be written 


i.e., as a mixture of Gaussian returns, weighted by the holding period distribution. 
Here Fn,t denotes the cumulative distribution function of the holding period at time 
t, i.e., of H t . 

Remark 1 (Mixtures for heavy-tailed and skewed distributions). Mixtures of distrib- 
utions have been used for a long time in statistics and may lead to heavy tails, allowing 
for modeling of skewed distributions and of extreme events. Given the fact that mix- 
tures lead, in the distributions space, to linear (convex) combinations of possibly 
simple and well-understood distributions, they are tractable and easy to interpret. 
The literature on mixtures is enormous and it is impossible to do justice to all this 
literature here. We just hint at the fact that static mixtures of distributions had been 
postulated in the past to fit option prices for a given maturity, see for example [24], 
where a mixture of normal densities for the density of the asset log-returns under 
the pricing measure is assumed, and subsequently [8, 16, 20]. In the last decade 
[2, 9, 10] have extended the mixture distributions to fully dynamic arbitrage-free 
stochastic processes for asset prices. 


as 
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Table 2 SHP distributions and market risk 


Holding period 

VaR 99.96% 

(Analytic) 

ES 99.96% 

(Analytic) 

Constant 10 b.d. 

20.1 

(20.18) 

21.7 

(21.74) 

Constant 75 b.d. 

55.7 

(55.54) 

60.0 

(59.81) 

SHP (Bernoulli 10/75, p i0 = 0.99) 

29.6 

(29.23) 

36.1 

(35.47) 


Going back to our notation, VaR t ,h,c and ES t ,h,c are the value at risk and expected 
shortfall, respectively, for a horizon h at confidence level c at time t, namely 

P {Xt+h ~ Xt > — VaR f /j c } = c, ESf h c = — E — Xt\X t +h — Xf < — VaR^/^] . 


We now recall the standard result on VaR and ES under Gaussian returns in 
deterministic calendar time. 

Proposition 1 (VaR and ES with Gaussian log-returns on a deterministic risk hori- 
zon h) In the Gaussian log-returns case where 

X t +h—X t is normally distributed with mean ii t ^ and standard deviation cr tj h (1) 

we obtain 

VaRt,h,c = -Vt,h + ES tth , c = ~l^i,h +&t,hP (<£ - 1 (c)) /(I - c) 

where p is the standard normal probability density function and <P is the standard 
normal cumulative distribution function. 

In the following we will calculate VaR and Expected Shortfall referred to a confi- 
dence level of 99.96 %, calculated over the fixed time horizons of 10 and 75 business 
days, and under SHP process with statistics given by Table 1, using Monte Carlo 
simulations. Each year has 250 (working) days. The results are presented in Table 2. 

More generally, we may derive the VaR and ES formulas for the case where H t 
is distributed according to a general distribution 


F(H t < x) = E//^(v), x >0 
and 

nX t+h -X t <x) = F x ^ h (x). 

Definition 1 (VaR and ES under Stochastic Holding Period) We define VaR and ES 
under a random horizon H t at time t and for a confidence level c as 

P {X t+Hl -X t > — VaR// , c } = C. ES H .t.c = -E [X t+H , - X,\X t+H , - X, < -VaR H ,t,c] ■ 
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We point out that the order of time/confidence/horizon arguments in the VaR and 
ES definitions is different in the Stochastic Holding Period case. This is to stress the 
different setting with respect to the fixed holding period case. 

We have immediately the following: 

Proposition 2 (VaR and ES for SHP independent of returns in deterministic calendar 
time) Assume that H is independent of the log returns ofX in deterministic calendar 
time. Using the tower property of conditional expectation it is immediate to prove 
that such a case VaRn,t,c obeys the following equation: 

00 

f (1 - Fx,t,h {-VaR H ,,,c)) d F Htt (h) = c 
0 

whereas ESn,t,c is given by 

oo 

EShj.c = ~ J E [X t+h - X,\X t+h - X, < — VcRhj.c ] Prob (X t+h - X, < -VaR„, uc ) d F„, t ( h ) 
0 

For the specific Gaussian case (1) we have 

oo 

0 


EShj.c = 


00 

I 


-Ut,h - VaR H,t,c 


°t.h 


) +<7 t ,hP ( 


—Ut, h ~ VaR H. 
&t,h 


“)] 


d F Hj (h) 


Notice that in general one can try and obtain the quantile VaR n,t,c for the random 
horizon case by using a root search, and subsequently compute also the expected 
shortfall. Careful numerical integration is needed to apply these formulas for general 
distributions of H t . The case of Table 2 is somewhat trivial, since in the case where 
Ho is as in Table 1 integrals reduce to summations of two terms. 

We note also that the maximum difference, both in relative and absolute terms, 
between ES and VaR is reached by the model under random holding period Ho. 
Under this model the change in portfolio value shows heavier tails than under a single 
deterministic holding period. In order to explore the impact of SHP’s distribution tails 
on the liquidity-adjusted risk, in the following we will simulate SHP models with 
Ho distributed as an Exponential, an Inverse Gamma distribution 1 and a Generalized 


1 Obtained by rescaling a distribution IG(|, |) with v = 3. Before rescaling, setting a = v/2, 
the inverse gamma density is f(x) = (l / T x~ a ~ l e~ a i x , x > 0, a > 0, with expected 
value a/(a — l). We rescale this distribution by k = 8. 66/ (a/ (a — 1)) and take for Ho the random 
variable with density f(x/k)/k. 
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Pareto distribution 2 having parameters calibrated in order to obtain a sample with 
the same 99 %-quantile of 75 business days. The results are in Table 3. 

The SHP process changes the statistical nature of the P&L process: the heavier 
the tails of the SHP distribution, the heavier the tails of P&L distribution. Notice 
that our Pareto distribution has tails going to 0 at infinity with exponent around 3, as 
one can see immediately by differentiation of the cumulative distribution function, 
whereas our inverse gamma has tails going to 0 at infinity with exponent about 2.5. 
In this example we have that the tails of the inverse gamma are heavier, and indeed 
for that distribution VaR and ES are larger and differ from each other more. This can 
change of course if we take different parameters in the two distributions. 


3 Dependence Modeling: A Bivariate Case 


Within multivariate modeling, using a common SHP for many normally distributed 
risks leads to dynamical versions of the so-called normal mixtures and normal mean- 
variance mixtures [19]. 

Assumption 2 In this section we assume that different assets have the same random 
holding period, thus testing an extreme liquidity dependence scenario. We will briefly 
discuss relaxing this assumption at the end of this section. We further assume that 
the stochastic holding period process is independent of the log returns of assets in 
deterministic calendar time. 


Let the log returns (recall X\ = In V/ , with V t l the value at time t of the ith asset) 

yl Y m Y m 

A t+h ~ A f ’ • • * ’ A t+h ~ 

be normals with means /z* h , . . . , ii™ h and covariance matrix Q t ,h- 
Then 


- X™ < v, 


■] 


[x) +Ht -x] X';‘ +Hi 

00 

= J P \x) +h -x] < xx,..., X? +h - X “ < x m ] d F Htt (h) 


is distributed as a mixture of multivariate normals, and a portfolio V t of the assets 
1,2 , ,m whose log-returns X t +h — X t ( X t = In V t ) are a linear weighted combi- 
nation w i , . . . , w m of the single asset log-returns X l t+h — X\ would be distributed as 


2 With scale parameter k = 9 and shape parameter a = 2.0651, with cumulative distribution 
function F(x) = 1 — , x > 0, this distribution has moments up to order a. So the smaller 

a, the fatter the tails. The mean is, if a > 1, E[//q] = k/(a — 1). 
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oo 

P [X, +Ht -X t <z\= Jp[wi (. Xj +h - X}) + ■ ■ ■ + w m (x n t ' +h - X?) < z] d F ff ,,(/») 
0 

In particular, in analogy with the unidimensional case, the mixture may potentially 
generate skewed and fat- tailed distributions, but when working with more than one 
asset this has the further implication that VaR is not guaranteed to be subadditive on 
the portfolio. Then the risk manager who wants to take into account SHP in such a 
setting should adopt a coherent measure like Expected Shortfall. 

A natural question at this stage is whether the adoption of a common SHP can add 
dependence to returns that are jointly Gaussian under deterministic calendar time, 
perhaps to the point of making extreme scenarios on the joint values of the random 
variables possible. 

Before answering this question, one needs to distinguish extreme behavior in the 
single variables and in their joint action in a multivariate setting. Extreme behavior 
on the single variables is modeled, for example, by heavy tails in the marginal dis- 
tributions of the single variables. Extreme behavior in the dependence structure of, 
say, two random variables is achieved when the two random variables tend to take 
extreme values in the same direction together. This is called tail dependence, and one 
can have both upper tail dependence and lower tail dependence. More precisely, but 
still loosely speaking, tail dependence expresses the limiting proportion according 
to which the first variable exceeds a certain level given that the second variable has 
already exceeded that level. Tail dependence is technically defined through a limit, 
so that it is an asymptotic notion of dependence. For a formal definition we refer, 
for example, to [19]. “Finite” dependence, as opposed to tail, between two random 
variables is best expressed by rank correlation measures such as Kendall’s tau or 
Spearman’s rho. 

We discuss tail dependence first. In case the returns of the portfolio assets are 
jointly Gaussian with correlations smaller than one, the adoption of a common ran- 
dom holding period for all assets does not add tail dependence, unless the commonly 
adopted random holding period has a distribution with power tails. Hence, if we 
want to rely on one of the random holding period distributions in our examples 
above to introduce upper and lower tail dependence in a multivariate distribution for 
the assets returns, we need to adopt a common random holding period for all assets 
that is Pareto or Inverse Gamma distributed. Exponentials, Lognormals, or discrete 
Bernoulli distributions would not work. This can be seen to follow from properties of 
the normal variance-mixture model, see for example [19], p. 212 and also Sect. 7.3.3. 

A more specific theorem that fits our setup is Theorem 5.3.8 in [23]. We can write 
it as follows with our notation. 

Proposition 3 (A common random holding period with less than power tails does 
not add tail dependence to jointly Gaussian returns) Assume the log returns to be 
W l t = In Vf with V t l the value at time t of the ith asset, i = 1,2, where 

w} +h - w}, w? +h - wf 
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are two correlated Brownian motions, i.e., normals with zero means, variances h, 
and instantaneous correlation less than 1 in absolute value: 


Then adding a common nonnegative random holding period Ho independent of W ’s 
leads to tail dependence in the returns 


if and only if */Hq is regularly varying at oo with index a > 0. 

Theorem 5.3.8 in [23] also reports an expression for the tail dependence coeffi- 
cients as functions of a and of the survival function of the student t distribution with 
a + 1 degrees of freedom. 

Summarizing, if we work with power tails, the heavier the tails of the common 
holding period process H , the more one may expect tail dependence to emerge for the 
multivariate distribution: by adopting a common SHP for all risks, dependence could 
potentially appear in the whole dynamics, in agreement with the fact that liquidity 
risk is a systemic risk. 

We now turn to finite dependence, as opposed to tail dependence. First, we note the 
well-known elementary but important fact that one can have two random variables 
with very high dependence but without tail dependence. Or one can have two random 
variables with tail dependence but small finite dependence. For example, if we take 
two jointly Gaussian Random variables with correlation 0.999999, they are clearly 
quite dependent on each other but they will not have tail dependence, even if a 
rank correlation measure such as Kendall’s t would be 0.999, still very close to 1, 
characteristic of the co-monotonic case. This is a case with zero tail dependence but 
very high finite dependence. On the other hand, take a bivariate student t distribution 
with few degrees of freedom and correlation parameter p = 0.1. In this case the two 
random variables have positive tail dependence and it is known that Kendall’s tau 
for the two random variables is 


which is the same tau one would get for two standard jointly Gaussian random 
variables with correlation p. This tau is quite low, showing that one can have positive 
tail dependence while having very small finite dependence. 

The above examples point out that one has to be careful in distinguishing large 
finite dependence and tail dependence. 

A further point of interest in the above examples comes from the fact that the 
multivariate student t distribution can be obtained by the multivariate Gaussian dis- 
tribution when adopting a random holding period given by an inverse gamma dis- 
tribution (power tails). We deduce the important fact that in this case a common 




x — — arcsin(p) ~ 0.1 


TV 
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random holding period with power tails adds positive tail dependence but not finite 
dependence. 

In fact, one can prove a more general result easily by resorting to the tower property 
of conditional expectation and from the definition of tau based on independent copies 
of the bivariate random vector whose dependence is being measured. One has the 
following “no go” theorem for increasing Kendall’s tau of jointly Gaussian returns 
through common random holding periods, regardless of the tail’s power. 

Proposition 4 (A common random holding period does not alter Kendall’s tau for 
jointly Gaussian returns) Assumptions as in Proposition 3 above. Then adding a 
common nonnegative random holding period Ho independent of W ’s leads to the 
same KendalTs tau for 

<• < 

as for the two returns 

for a given deterministic time horizon t. 

Summing up, this result points out that adding further finite dependence through 
common SHPs, at least as measured by Kendall’s tau, can be impossible if we start 
from Gaussian returns. A different popular rank correlation measure, Spearman’s 
rho, does not coincide for the bivariate t and Gaussian cases though, so that it is 
not excluded that dependence could be added in principle though dependent hold- 
ing periods, at least if we measured dependence with Spearman’s p. This is under 
investigation. 

More generally, at least from a theoretical point of view, it could be interesting 
to model other kinds of dependence than the one stemming purely from a common 
holding period (with power tails). In the bivariate case, for example, one could have 
two different holding periods that are themselves dependent on each other in a less 
simplistic way, for example through a common factor structure, rather than being just 
identical. In this case it would be interesting to study the tail dependence implications 
and also finite dependence as measured by Spearman’s rho. 

We will investigate this aspect in further research, but increasing dependence 
may require, besides the adoption of power tail laws for the random holding periods, 
abandoning the Gaussian distribution for the basic assets under deterministic calendar 
time. 

A further aspect worth investigating is the possibility to calculate semi-closed 
form risk contributions to VaR and ES under SHP along the lines suggested in [26], 
and to investigate the Euler principle as in [27, 28]. 

4 Calibration with Liquidity Data 

We are aware that multivariate SHP modeling is a purely theoretical exercise and that 
we just hinted at possible initial developments above. Nonetheless, a lot of financial 
data is being collected by regulators, providers, and rating agencies, together with 
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a consistent effort on theoretical and statistical studies. This will possibly result in 
available synthetic indices of liquidity risk grouped by region, market, instrument 
type, etc. For instance, Fitch already calculates market liquidity indices on CDS 
markets worldwide, on the basis of a scoring proprietary model [14]. 


4.1 Dependencies Between Liquidity , Credit, and Market Risk 

It could be an interesting exercise to calibrate the dependence structure (e.g., cop- 
ula function) between a liquidity index (like the Fitch’s one), a credit index (like 
iTRAXX), and a market index (for instance Eurostoxx50) in order to measure the 
possible (nonlinear) dependence between the three. The risk manager of a bank 
could use the resulting dependence structure within the context of risk integration, 
in order to simulate a joint dynamics as a first step, to estimate later on the whole 
liquidity-adjusted VaR/ES by assuming co-monotonicity between the variations of 
the liquidity index and of the SHP processes. 


4.2 Marginal Distributions of SHPs 

A lot of information on SHP ‘extreme’ statistics of an OTC derivatives portfolio 
could be collected from the statistics, across Lehman’s counterparties, of the time 
lags between the Lehman’s Default Event Date and the trade dates of any replacement 
transaction. The data could give information on the marginal distribution of the SHP 
of a portfolio, in a stressed scenario, by assuming a statistical equivalence between 
data collected ‘through the space’ (across Lehman’s counterparties) and ‘through 
the time’ under i.i.d. hypothesis. 3 The risk manager of a bank could examine a 
more specific and non-distressed dataset by collecting information on the ordinary 
operations of the business units. 


5 Conclusions 

Within the context of risk integration, in order to include liquidity risk in the whole 
portfolio risk measures, a stochastic holding period (SHP) model can be useful, 
being versatile, easy to simulate, and easy to understand in its inputs and outputs. 
In a single-portfolio framework, as a consequence of introducing an SHP model, the 
statistical distribution of P&L moves to possibly heavier tailed and skewed mixture 
distributions. In a multivariate setting, the dependence among the SHP processes to 
which marginal P&L are subordinated, may lead to dependence on the latter under 
drastic choices of the SHP distribution, and in general to heavier tails on the total 


3 A similar approach is adopted in [21] within the context of operational risk modeling. 
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P&L distribution. At present, lack of synthetic and consensually representative data 
forces to a qualitative top-down approach, but it is straightforward to assume that 
this limit will be overcome in the near future. 
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Regulatory Developments in Risk 
Management: Restoring Confidence 
in Internal Models 


Uwe Gaumert and Michael Kemmer 


Abstract The paper deals with the question of how to restore lost confidence in 
the results of internal models (especially market risk models). This is an impor- 
tant prerequisite for continuing to use these models as a basis for calculating risk- 
sensitive prudential capital requirements. The authors argue that restoring confidence 
is feasible. Contributions to this end will be made both by the reform of regulatory 
requirements under Basel 2.5 and the Trading Book Review and by refinements of 
these models by the banks themselves. By contrast, capital requirements calculated 
on the basis of a leverage ratio and prudential standardised approaches will not be 
sufficient, even from a regulatory perspective, owing to their substantial weaknesses. 
Specific proposals include standardising models with a view to reducing complexity 
and enhancing comparability, significantly improving model validation and increas- 
ing transparency as to how model results are determined, also over time. The article 
reflects the personal views of the authors. 


1 Introduction 

Since 1997 (“Basel 1.5”), banks in Germany have been allowed to calculate their 
capital requirements for the trading book using internal value-at-risk (VaR) models 
that have passed a comprehensive and stringent supervisory vetting and approval 
process. Basel II and Basel III saw the introduction of further internal models com- 
plementing the standardised approaches already available — take, for example, the 
internal ratings-based (IRB) approach for credit risk under Basel II and the advanced 
credit valuation adjustment (CVA) approach for counterparty risk under Basel III. 
During the financial crisis, particular criticism was directed at internal market risk 
models, the design of which supervisors largely left to the banks themselves. This 
article therefore confines itself to examining these models, which are a good starting 
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point for explaining and commenting on the current debate. Much of the following 
applies to other types of internal models as well. 

Banks and supervisors learned many lessons from the sometimes unsatisfactory 
performance of VaR models in the crisis — one of the root causes of the loss of confi- 
dence by investors in model results. This led, at bank level, to a range of improvements 
in methodology, and also to the realisation that not all products and portfolios lend 
themselves to internal modelling. At supervisory level, Basel 2.5 ushered in an initial 
reform with rules that were much better at capturing extreme risks (tail risks) and 
that increased capital requirements at least threefold. Work on a fundamental trading 
book review (Basel 3.5), which will bring further methodological improvements to 
regulatory requirements, is also underway. 

Nevertheless, models are still criticised as being 

• too error-prone, 

• suitable only for use in “fair-weather” conditions, 

• too variable in their results when analysing identical risks, 

• insufficiently transparent for investors and 

• manipulated by banks, with the tacit acceptance of supervisors, with the aim of 
reducing their capital requirements. 

As a result, the credibility of model results and thus their suitability for use as a 
basis for calculating capital requirements have been challenged. This culminated 
in, for example, the following statement by the academic advisory board at the 
German Ministry for Economic Affairs: “Behind these flaws (in risk modelling) 1 
lie fundamental problems that call into question the system of model-based capital 
regulation as a whole.” 2 It therefore makes good sense to explore the suitability of 
possible alternatives. The authors nevertheless conclude that model-based capital 
charges should be retained. But extensive efforts are needed to restore confidence in 
model results. 


2 Loss of Confidence in Internal Models — How Did It 
Happen? 

2.1 An Example from the First Years of the Crisis 

The market disruption which accompanied the start of the financial crisis in the second 
half of 2007 took the form in banks’ trading units of sharply falling prices with a 
corresponding impact on their daily P&Ls after a prolonged phase of low volatility. 
Uncertainty grew rapidly about the accuracy of estimated probabilities of default, 
default correlations of the underlying loans and the scale of loss in the event of default, 


1 Wording in brackets inserted by the authors. 

2 [31], p. 19. 
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and thus also about the probabilities of default and recovery rates of the securitisation 
instruments. This in turn caused spreads to widen, volatility to increase and market 
liquidity for securitisation products to dry up. A major exacerbating factor was that 
many market participants responded in the same way (“flight to simplicity”, “flight 
to quality”). Later on, there were also jump events such as downgrades. Calibrating 
the above parameters proved especially problematic since there was often a lack of 
historical default or market data. Unlike in the period before the crisis, even AAA- 
rated senior or super senior tranches of securitisation instruments, which only start 
to absorb loss much later than their riskier counterparts, suffered considerably in 
value as the protective cushion of more junior tranches melted away, necessitating 
substantial write-downs. 3 

The performance of internal market-risk models was not always satisfactory, espe- 
cially in the second half of 2007 and in the “Lehman year” of 2008. In this period, 
a number of banks found that the daily loss limits forecast by their models were 
sometimes significantly exceeded (backtesting outliers). 4 The performance results 
of Deutsche Bank, for instance, show that losses on some sub-portfolios were evi- 
dently serious enough to have an impact on the overall performance of the bank’s 
trading unit. This demonstrates the extremely strong market disruption which can 
follow an external shock. When backtesting a model’s performance, the current 
clean P&L — P&L ? — is compared with the previous day’s VaR forecast VaR^_i . 5 At 
a confidence level of 99 %, an average of two to three outliers a year may be antic- 
ipated over the long term (representing 1 % of 250-260 trading days a year). In the 
years between 2007 and 2013, Deutsche Bank had 12, 35, 1, 2, 3, 2 and 2 outliers. 6 
Although the models’ performance for 2007 and 2008 looks bad at first sight, the 
question nevertheless arises as to whether or not these outliers are really the models’ 
“fault”, so to speak. By their very nature, models can only do what they have been 
designed to do: “If you’re in trouble, don’t blame your model.” To function properly, 
the models needed liquid markets, adequate historical market data and total coverage 
of all market risks, particularly migration and default risk. These prerequisites were 
not always met by markets and banks. Anyone using a model has to be aware of its 
limitations and exercise caution when working with its results. 

Even Germany’s Federal Financial Supervisory Authority BaFin pointed out that, 
given the extreme combination of circumstances on the market in connection with 
the financial crisis, the figures do not automatically lead to the conclusion that the 
predicative quality of the models is inadequate. 7 The example could indicate that, 


3 Cf. [18], p. 128. 

4 [10], p. 8. 

5 Between 2007 and 2009, only so-called “dirty” P&L results were published in chart form, while 
outliers are based on “clean” P&L data. This inconsistency was eliminated in 2010. Dirty and clean 
P&L figures may differ. This is because clean P&L simply shows end-of-day positions revalued 
using prices at the end of the following trading day, whereas dirty P&L also includes income from 
intraday trading, fees and commissions and interest accrued. 

6 [11], Management Report, 2007, p. 88, 2008, p. 98, 2009, p. 85, 2010, p. 95, 2011, p. 104, 2012, 
p. 167, 2013, p. 170. 

7 [16], p. 133. 
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since 2009, the bank has been successful in eliminating its models weaknesses, at 
least at the highest portfolio level. It should nevertheless be borne in mind that market 
phases analysed after 2008 were sometimes quieter and that there has also been some 
reduction in risk. The increasing shift in the nature of the financial crisis from 2010 
towards a crisis concerning the creditworthiness of peripheral European countries, 
which created new market disruption, is most certainly reflected at the highest level of 
the backtesting time series. Particularly large losses were incurred in March and May 
2010, which only in May 2010 led to the two outliers realised that year. These outliers 
may be explained by the fears brewing at the time about the situation of the PUGS 
states. Possibly, the scale of the corresponding trading activities was such that any 
problems with the models for these sub-portfolios made themselves felt at the highest 
portfolio level. The weaknesses outlined below were, by the banks own testimony, 
identified and rapidly addressed. 8 As mentioned above, two to three outliers per year 
represent the number to be expected and are not sufficient, in themselves, to call the 
quality of modelling into question. 

The flaws banks identified in their models following the outbreak of the crisis 
revealed that a variety of areas needed work and improvement. These improvements 
have since been carried out. Some examples of model weaknesses which banks have 
now resolved are 9 : 

1. No coverage of default-risk driven “jump events”, such as rating changes and 
issuer defaults. At the outbreak of the crisis, models often failed to cover the 
growing amount of default risk in the trading book. The introduction of IRC 
models 10 to cover migration and default risk helped to overcome this. 

2. Insufficient coverage of market liquidity risk. It was often not possible to liquidate 
or hedge positions within the ten-day holding period assumed under Basel 1.5. 
This led to risks being underestimated. Basel 2.5 takes account of market liquidity 
risk explicitly and in a differentiated way, at least for IRC models. Full coverage 
will be achieved under Basel 3.5. 

3. Slow response to external shocks ( outlier clustering). The introduction of stress 
VaR under Basel 2.5 went a long way towards eliminating the problem of under- 
estimating risks in benign market conditions. Historical market data for “normal 
VaR” are now adjusted daily, while monthly or quarterly adjustments were the 
norm before the crisis. 

4. Insufficient consideration of the risk factors involved in securitisation. As a result, 
models designed for securitisation portfolios may no longer be used to calculate 
capital charges (with the exception of the correlation trading portfolio). Even 
before the rule change, some banks had already decided themselves to stop using 
these models. 

5. Flawed proxy approaches. Prior to the crisis, it was often possible to assign a 
newly introduced product to an existing one and assume the market risk would 


8 Cf. [11], Management Report, 2010, p. 91. 

9 [30], pp. 13-17. 

10 IRC stands for incremental risk charge. This refers to risks such as migration and default risk, 
which were not covered by traditional market risk models before the crisis. 
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behave in the same way. During the crisis, this assumption proved to be flawed. 1 1 
The supervisory treatment of such approaches is now much more restrictive. 

6. The approximation of changes in the price of financial instruments cannot accom- 
modate large price movements (delta- gamma approximations). Full revaluation 
of instruments is now standard practice. 

7. No and/or flawed scaling to longer time horizons. Scaling practices of this kind, 
such as square-root-of-time scaling, are now subject to prudential requirements 
to ensure their suitability. 

These problems were the basis of the review of market risk rules under Basel 2.5 
and, as described above, were able to be eliminated both by banks themselves and by 
new supervisory requirements. 12 Despite this large-scale and appropriate response, 
distrust of internal model results and their use for prudential purposes persisted, 
leading to further fundamental discussions. 13 


2.2 Divergence of Model Results 

This continuing distrust at the most senior level of the Basel Committee 14 led to the 
commissioning of the Standards Implementation Group for Market Risk (SIG-TB) to 
compare the results generated by the internal models of various banks when applied 
to the same hypothetical trading portfolios (hypothetical portfolio exercise). A major 
point of criticism has always been that internal model results are too variable even if 
the risks involved are the same. In January 2013, the SIG-TB published its analysis. 15 
The following factors were identified as the key drivers of variation: 

• The legal framework: some of the banks in the sample did not have to apply Basel 
2.5. This means the US banks, for instance, supplied data from models that had 
neither been implemented nor approved. Analysis showed that some of these banks 
had significantly overestimated risk, though this did not, in practice, translate into 
higher capital requirements. 

• National supervisory rules for calculating capital requirements : differences were 
noted, for example, in the multipliers set by supervisors for converting model 
results into capital requirements. In addition, some supervisors had already 
imposed restrictions on the type of model that could be used and/or set specific 
capital add-ons. 

• Legitimate modelling decisions taken by the banks: among the most important 
of these was the choice of model (spread-based, transition matrix-based) in the 
absence of a market standard for modelling rating migration and default risk (IRC 


11 [18], p. 133. 

12 [21], pp. 59 ff., [25], p. 39. 

13 Cf. Sect. 3. 

14 The precise reasons for this distrust at senior level are not known. 

15 Cf. [6]. 


24 


U. Gaumert and M. Kemmer 


models). Different assumptions about default correlations also led to different 
results. In VaR and stressed VaR models, major factors were the length of data 
histories (at least one year, no maximum limit), the weighting system, the aggre- 
gation of asset classes and of general and specific market risk, and the decision 
whether to scale a one-day VaR up to ten days or estimate a 10-day VaR directly. 
The choice of stress period for the stressed VaR also played an important role. 16 

In summary, the differences noted were the result of legitimate decisions taken 
by banks with the approval of supervisors and of variations between supervisory 
approval procedures. There is no evidence to suggest manipulation with the aim of 
reducing capital requirements. Differences can also be explained by variations in 
the applicable legal framework and in the market phase on which the study was 
based. An issue related to the market phase is the length of the observation period 
used. Observation periods of differing lengths will have an impact if, for instance, 
the volatility of market data has changed from high (during a period over one year 
ago) to low (last year). In this example, a bank using a one-year data history will not 
capture the phase of higher volatility. This volatility will, by contrast, most certainly 
be captured by any bank using a longer data history (with the extent also depending 
on the weighting system applied to historical data). 

It is also important to note that the study was based on a hypothetical portfolio 
approach at the lowest portfolio level and not on real portfolios. The study does not 
address the inherent weakness of this method. One major weakness is that the test 
portfolios used do not reflect portfolio structures in the real world. Portfolios for 
which banks calculate VaR are normally located at a far higher level in the portfolio 
“tree” and are consequently more diversified. If the portfolios analysed had been 
more realistic, variations would probably have been significantly less marked. 17 

Even if the variation between results can be readily explained and cannot be 
“blamed” on the banks, it is nevertheless difficult to communicate differences of, for 
instance, around 13-29 million euros in the results for portfolio 25, the most highly 
aggregated portfolio. 18 Efforts are most certainly needed to reduce the amount of 
variation by means of further standardisation, even if complete alignment would not 
make good sense (see Sect. 4.2). At first sight, the differences could also be interpreted 
as a quantitative measure of the uncertainty surrounding model results and thus as an 
expression of model risk. Section 4.7 will explore to what extent this is a reasonable 
analysis and whether banks should try to capitalise model risk themselves as things 
stand. As the next section shows, dispensing with internal models for prudential 
purposes would not, by contrast, be the correct response. 


16 Cf. [6], p. 10. 

17 The study by the SIG has now been expanded to cover more complex portfolios, cf. [7]. The 
results are nevertheless comparable. Variation increases with the complexity of the portfolios. In the 
first analysis, this was found to be particularly the case with IRC modelling compared to “normal” 
market risk modelling. 

18 Cf. portfolio 25, [6], p. 27. 
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3 Alternatives to Internal Models 
3.1 Overview 

Given the difficulties associated with modelling and the variation in results, it is 
legitimate to ask whether model-based, risk- sensitive capital charges should be 
dropped altogether. Such a step would, moreover, significantly simplify regulation. 
But it could also be asked whether it would not make more sense to address the 
undoubted weaknesses of internal models by means of the reforms already in place 
or in the pipeline without “throwing the baby out with the bath water”, i.e. should we 
not try to learn from past mistakes instead of just giving up. These questions can best 
be answered systematically by examining to what extent the existing regulatory pro- 
posals could, together or on their own, replace model-based capital charges. There 
are essentially two alternatives under discussion: 

• dropping risk- sensitive capital charges and introducing a leverage ratio as the sole 
“risk metric”; 

• regulatory standardised approaches: applying risk-sensitive capital charges while 
abandoning model-based ones. 


3.2 The Leverage Ratio 


An exclusively applicable, binding leverage ratio — defined as the ratio of tier 1 
capital to total assets including off-balance-sheet and derivative positions 19 — is only 
a logical response if it must be assumed that neither banks nor supervisors are capa- 
ble of measuring the risks involved in banking. Advocates of this approach talk of 
the “illusion of the measurability of risk.” 20 They argue that we are in a situation 
of “uncertainty”, not “risk”. Uncertainty in decision theory is characterised by two 
things: neither are all conceivable results known, nor is it possible to assign proba- 
bilities to the results or estimate a probability density function. In this case, it would 
not, for example, be possible to calculate a VaR defined as a quantile of a portfolio 
loss distribution. This is only possible under “risk”. 

The concepts of “uncertainty” and “risk” are, however, abstract, theoretical 
extremes, while the various situations observed in reality usually lie somewhere 
in between. The answer to the question of whether it is more appropriate to assume 
a risk situation or an uncertainty situation is determined above all by the availability 
of the data needed for the model estimate (such as market data or historical default 
data). If, in addition, the risk factors associated with the financial instruments are 
known and taken into account, and if the potential changes in the value of a trading 


19 The most recent revision of the Basel Committee’s definition of the leverage ratio can be found 
in draft form in [4] and, in its final form, in [8]. 

20 Cf. [31], p. 19. 
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portfolio can be satisfactorily measured, (quality of the stochastic model, no normal 
loss distribution as a rule), determining a VaR of portfolio losses is likely to be appro- 
priate. 21 This may be assumed for the vast majority of trading portfolios. Should this 
nevertheless not be the case, regulatory standardised approaches, which normally 
require less data to be available, could then be used. Reviewing and adjusting mod- 
els is a never-ending task for banks. The model risks which undoubtedly exist (e.g. 
estimation errors) are also a focus of supervisors’ attention. An awareness of the 
limits of a model and of such model risks does not, however, make the use of models 
obsolete. 22 Although modelling by its very nature always involve simplification of 
reality, quantitative and qualitative model validation is crucial. Supervisors set and 
enforce stringent rules for such validation. 23 

Advocates of the “uncertainty approach” propose a so-called heuristic as a “rule 
of thumb” and as a risk metric, at least for supervisors. Leverage ratios with widely 
differing minimum levels have been suggested as a heuristic for ensuring the solvency 
of banks. The levels called for range from 3 to 30 %. 24 As is generally recognised, it 
is not possible to infer a specific minimum level from theory. 

The question of whether a leverage ratio is actually a suitable heuristic for ensur- 
ing solvency has not been satisfactorily answered, however. Empirical studies to 
determine to what extent the leverage ratio is a statistical, univariate risk factor that 
can distinguish between banks that survive and those that fail come to different 
conclusions. 25 Often, no such distinguishing ability can be demonstrated. This may 
have an economic explanation since the leverage ratio, as a vertical metric on the 
liabilities side of the balance sheet, cannot act as a horizontal metric of a bank’s risk- 
bearing capacity by means of which sources of loss (causes of insolvency), which 
are mainly to be found on the assets side of the bank’s balance sheet, are compared 
with a loss-absorbing indicator (capital). This can, by contrast, be accomplished by 
ratios such as the “core tier 1” or “tier 1” capital ratio. If, moreover, a leverage ratio 
were a measure capable of predicting the insolvency of certain types of banks, it 
would probably swiftly cease to be a good measure once it became a binding target 
(Goodhart’s Law). 

What is more, the leverage ratio has a very long — and already widely discussed — 
list of drawbacks. 26 These are the points of most relevance here: 

• Perverse incentives and the potential for arbitrage: there are strong incentives to 
make business models more risky. Because assets are measured on a non-risk- 
weighted basis, an AAA investment, for instance, ties up just as much capital as 
does a B investment. 


21 Cf. [19], p. 36. 

22 See footnote 21 

23 See also Sect. 4.6. 

24 Cf., for example, [3 1], p. 23 (15 % capital ratio), cf. [26], p. 182 (20-30 % capital ratio). Leverage 
ratios set at this level would override risk-based standards, thus rendering them obsolete. 

25 Cf., for example, the summarising article [32], pp. 26 f. 

26 Cf., for example, [17] or [20], p. 58. 
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• A leverage ratio is by no means “model free”: highly complex valuation models 
or even simulation approaches are sometimes needed to measure derivatives on 
a marked- to-market basis, for example. In a broader context, this is more or less 
true for all balance-sheet valuations. So even a leverage ratio cannot claim to be 
the simple, robust rule that proponents of a heuristic approach are looking for . 27 

• It makes it impossible to compare capital adequacy across banks. The adequacy 
of a bank’s capital resources cannot be assessed without measuring the associated 
risks. 

For these and other reasons not mentioned here, the international banking community 
continues to reject the leverage ratio as a sole indicator and as a binding limit. At most, 
it may make sense to monitor changes in a bank’s leverage ratio, but not its absolute 
level; this is the approach of the German Banking Act at present . 28 Supervisors 
have widely differing views on the leverage ratio. Even Haldane/Madouros (Bank 
of England) by no means call in their famous “The dog and the frisbee” speech for 
a leverage ratio on its own or a minimum leverage ratio set at such a high level that 
risk-based requirements are overridden and therefore indirectly rendered obsolete 
(leverage ratio as a frontstop instead of the Basel backstop). Owing to the massive 
perverse incentives which they too have noted, they talk instead of placing leverage 
ratios on an equal footing with capital ratios . 29 


3.3 Regulatory Standardised Approaches 

Standardised approaches, i.e. approaches which spell out in detail how to calculate 
capital requirements on the basis of prudential algorithms (“supervisory models”), 
will always be needed for smaller banks which cannot or do not wish to opt for internal 
models. But larger banks need standardised approaches too — as a fallback solution 
if their internal models are or become unsuitable for all or for certain portfolios. 
Having said that, a standardised approach alone is by no means sufficient for larger 
banks; the reasons are as follows 30 : 

• It is invariably true of a standardised approach that “one size does not fit all banks”. 
Since a standardised approach is not tailored to an individual bank’s portfolio 
structure, it cannot measure certain risks (such as certain basis risks) or can only 


27 The discussion about a suitable definition of the leverage ratio also shows that improved definitions 
invariably lead to significantly greater complexity, cf. [8]. 

28 Cf. Section24 (1) (16) and (la) (5) of the German Banking Act [27]. 

29 Cf. [24], p. 19: “The case against leverage ratios is that they may encourage banks to increase 
their risk per unit of assets, reducing their usefulness as an indicator of bank failure — a classic 
Goodhart’s Law. Indeed, that was precisely the rationale for seeking risk-sensitivity in the Basel 
framework in the first place. A formulation which would avoid this regulatory arbitrage, while 
preserving robustness, would be to place leverage and capital ratios on a more equal footing.” A 
leverage ratio of at least 7 % would be necessary for this purpose, in the authors’ view. 

30 Cf. [19], p. 37. 
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do so very inaccurately. It is normally much less risk- sensitive than an internal 
model. 

• A related problem is that a standardised approach usually works only with com- 
paratively simple portfolios. This results in risk being overstated or understated. 

• It normally fails to capture diversification or hedging effects satisfactorily. 

• Standardised approaches can thus be more dangerous than internal models because 
it is often easy to “game the system”. Trading revenue, for instance, can be gen- 
erated seemingly without risk, enabling trading units to inflate risk-adjusted earn- 

31 

mgs. 1 

• If internal models are no longer used, supervisors will also have to dispense with 
banks’ risk-management expertise. 

• Standardised approaches are simple models. But as all proposals for standardised 
approaches to date have shown, supervisors are by no means better at constructing 
models than are the banks themselves. 

A further alternative would be scenario-based approaches, which are often relatively 
similar to models, such as those which may currently be used for calculating capital 
charges for options under the standardised approach to market risk (scenario matrix 
approach). This alternative, though definitely worth considering, is not being dis- 
cussed at present and will therefore be only briefly explored in this article. Scenario 
approaches may be regarded as a kind of “halfway house” between risk- sensitive 
standardised approaches and internal models. If they are prescribed as a regulatory 
standardised approach, they may also demonstrate the weaknesses of standardised 
approaches described above. The key criteria for evaluating such approaches are the 
scenario generation technique and the process/algorithm used for calculating val- 
uation adjustments on the basis of the scenarios. An especially critical question is 
to what extent the (tail) loss risk of the instruments and portfolios concerned can 
be captured. At one end of the spectrum are approaches that merely differentiate 
between a few scenarios (e.g. base case and adverse case) and make no attempt to 
estimated a loss distribution. At the other extreme are internal models which simulate 
such a large number of scenarios that it is possible to estimate a loss distribution on 
the basis of which a parameter such as VaR or expected shortfall can be calculated. 
Another important question is whether or not the scenario generation takes account 
of stressed environmental conditions. 

To sum up, standardised approaches usually have considerable failings when 
it comes to measuring risk, especially the risk associated with large-scale, com- 
plex trading activities. On their own, they are not an adequate basis on which to 
determine appropriate capital requirements . 32 So it may be concluded at this point 


31 One example: when supervisors set risk factors in the standardised approach model, basis risk is 
often ignored because different risk factors are (and must be) mapped to the same regulatory risk 
factor. This is part of the model simplification process. It is often easy to design a trade to exploit 
the “difference”. 

32 The outlined shortcomings of standardised approaches also mean they have only limited suitability 
as a floor for model-based capital requirements. Contrary to what is sometimes claimed, model risk 
would therefore not be reduced by a floor. 
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that, together or separately, a leverage ratio and standardised approaches are inappro- 
priate and insufficient from a supervisory perspective. Internal models must remain 
the first choice. Nevertheless, confidence in internal models needs to be significantly 
strengthened. 


4 Ways of Restoring Confidence 

4.1 Overview 

The first, important step should be to standardise supervisory approval processes to 
eliminate this major source of variation. A single set of approval and review standards 
should be developed for application worldwide. A globally consistent procedure 
needs to be enforced for granting and withdrawing permission to use models. With 
activities of this kind, supervisors themselves could make a significant contribution 
to restoring confidence. 33 

A number of further proposals are also under discussion at present. Together, they 
have the potential to go a long way towards winning back trust: 

a. Reducing the variation in model results through standardisation (Sect. 4.2). 

b. Enhancing transparency (Sect. 4.3). 

c. Highlighting the positive developments as a result of the trading book review 
(Sect. 4.4). 

d. Strengthening the use test concept (Sect. 4.5). 

e. A comprehensive approach to model validation (Sect. 4.6). 

f. Quantification and capitalisation of model risk (Sect. 4.7). 

g. Voluntary commitment by banks to a code of “model ethics” (Sect. 4.8). 

h. Other approaches (Sect. 4.9). 

4.2 Reducing the Variation in Model Results Through 
Standardisation 


First of all, however, it is important to be aware of the dangers of excessive standard- 
isation 34 : 


33 For example: the range of multipliers (“3 + x” multiplier), which convert model results into 
capital requirements, and the reasons for their application differ widely from one jurisdiction to 
another. 

34 The Basel Committee is already trying to find a balance between the objectives of “risk sensitiv- 
ity”, “complexity” and “comparability”. Standardisation has the potential to reduce the complexity 
of internal models and increase their comparability. Against that, increasing the complexity of 
standardised approaches often improves comparability. See [5, 22] on the balancing debate. 
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• Standardised models can pose a threat to financial stability because they encourage 
all banks to react in the same way (herd behaviour). Model diversity is a desirable 
phenomenon from a prudential point of view since it generates less procyclicality. 

• Standardised models would frequently be unsuitable for internal use at larger 
banks, which would consequently need to develop alternative models for internal 
risk management purposes. As a result, the regulatory model would be maintained 
purely for prudential purposes (in violation of the use test; see below). This would 
encourage strategies aimed at reducing capital requirements since the results of 
this model would not have to, and could not, be used internally. 

• It is therefore in the nature of models that a certain amount of variation will 
inevitably exist. 

Nonetheless, it is most certainly possible to standardise models in a way which will 

reduce their complexity and improve the comparability of their results but will not 

compromise their suitability for internal use. Here are a few suggestions 35 : 

• Develop a market standard for IRC models to avoid variation as a result of differ- 
ences in the choice of model (proposed standard established by supervisors: see 
Trading Book Review). 

• Reduce the amount of flexibility in how historical data are used. For the standard 
VaR, one year should be not just the minimum but both the minimum and max- 
imum period. This may well affect different banks in different ways, sometimes 
increasing capital requirements and sometimes reducing them. 

• Standardise the stress period for stressed VaR. The period should be set by super- 
visors instead of being selected by banks. True, this means the stress period would 
no longer be optimally suited to the individual portfolio in question. But as the 
study by the Basel Committee’s SIB-TB has shown, similar periods may, as a 
result of the financial crisis, be considered relevant at the highest portfolio level — 
namely the second half of 2008 (including Lehman insolvency) and the first half 
of 2009. 36 


4.3 Enhancing Transparency 


Much could also be done to improve transparency. Banks could disclose their 
modelling methodologies in greater detail, and explain — for example — why changes 
made to their models have resulted in reduced capital charges. Transparency of this 
kind will significantly benefit informed experts and analysts. These experts will then 
be faced with the difficult challenge of preparing their analyses in such a way as to 
be accessible to the general public. The public at large cannot be expected to be the 
primary addressees of a bank’s disclosures. Someone without specialist knowledge is 
unlikely to be able to understand a risk report, for instance. Nor is it the task of banks 


35 Cf. [23]. 

36 Cf. [6], p. 50. 
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to write their reports in a manner that makes such specialist knowledge unnecessary. 
This is, however, by no means an argument against improving transparency. 

The work of the Enhanced Disclosure Task Force (EDTF) is also a welcome 
contribution 37 and some banks have already implemented its recommendations in 
their trading units voluntarily. The slide from the Deutsche Bank’s presentation for 
analysts on 31 January 2013 is just one illustration. 38 This explains, in particular, the 
changes in market-risk-related RWAs (mRWA flow), i.e. it is made clear what brought 
about the reduction in capital requirements in the trading area. The reasons include 
reduced multipliers (for converting model results into capital requirements) on the 
back of significantly better review results, approval of models (IRB approach, IMM) 
for some additional products and the consideration of additional netting agreements 
and collateral in calculations of capital requirements. 

Another possible means of improving transparency would be to disclose the his- 
tory of individual positions with a certain time lag. Serious discussion is nevertheless 
called for to determine at what point the additional cost of transparency incurred by 
banks would exceed the additional benefit for stakeholders. From an economic per- 
spective, this may be regarded as a transparency ceiling. 


4.4 Highlighting the Positive Developments as a Result 
of the Trading Book Review 


The Basel Committee is currently working on a fundamental review of how capital 
requirements should be calculated for trading book exposures. 39 It has taken criticism 
of the existing regime on board and proposes to reduce the leeway granted to banks 
in the design of their internal models. Without going into the Committee’s extensive 
analysis in detail, here are some key elements of relevance to the questions examined 
in this article: 

• Expected shortfall is to be introduced as a new risk metric calibrated to a period 
of market stress. The intention is to switch to a coherent measure of risk which 
can take better account of tail risk. 40 The reference to a stress period is intended 
to address the issue of “fair-weather models” (the problem facing the turkey in 
Taleb’s “The Black Swan”). 

• A so-called desk approach is to be introduced for granting and withdrawing 
approval for models. In the future, model approval is to be decided on a 
case-by-case basis at trading desk level. This will enable portfolios which are 
illiquid and/or cannot easily be modelled to be excluded from the model’s scope. 


37 Cf. [13]. Recommendations for market risk (nos 22-25), cf. pp. 12, 51-55. 

38 Cf. [12], p. 23. 

39 Cf. [2, 3]. 

40 Cf. [1], p. 203. 
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• Model validation will take place at desk level and become even more stringent 
through backtesting and a new P&L attribution process. This will significantly 
improve the validation process. At the same time, it will have the effect of raising 
the barriers to obtaining supervisory approval of internal models. 

• All banks using models will also have to calculate requirements using the stan- 
dardised approach. Supervisors take the view that the standardised approach can 
serve as a floor, or even a benchmark, for internal models (the level of the floor has 
not yet been announced). This may provide a further safety mechanism to avoid 
underestimating risk, even if the standardised approach does not always produce 
sound results (see above). 


4.5 Strengthening the Use Test Concept 


Up to now, approval of internal models has been dependent, among other things, 
on supervisors being convinced that the model is really used for internal risk man- 
agement purposes. Banks consequently have to demonstrate that the model they 
have submitted for supervisory approval is their main internal risk management tool. 
Basically, they have to prove that the internal model used to manage risk is largely 
identical to the model used to calculate capital charges (use test). The rationale behind 
this sensible supervisory requirement is that the quality of these risk measurement 
systems can best be ensured over time if the internal use of the model results is an 
absolute prerequisite of supervisory approval. As a result of the use test, the bank’s 
own interests are linked to the quality of the model. The design of the model should 
on no account be driven purely by prudential requirements. Moreover, the reply to 
the question of how model results are used for internal risk management purposes 
shows what shape the bank’s “risk culture” is in. 

The use test concept has been undermined, however, by a development towards 
more prudentially driven models which began under Basel 2.5 and is even more 
pronounced under Basel 3.5. This trend should be reversed. At a minimum, the core 
of the model should be usable internally — that is to say be consistent with the bank’s 
strategies for measuring risk. Conservative adjustments can then be made outside the 
core. 

4.6 A Comprehensive Approach to Model Validation 


It should be borne in mind that conventional backtesting methods cannot be per- 
formed on IRC models. Instead, the EBA has issued special guidelines based on 
indirect methods such as stress tests, sensitivity and scenario analyses. 41 
A distinction therefore needs to be made between “normal” market risk models 


41 


Cf. [14], pp. 15 f. 
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and IRC models. Though validation standards already exist for IRC models, they 
can by no means be described as comprehensive. 

For normal market risk models, a comprehensive approach going beyond purely 
quantitative backtesting and the P&L attribution process could be supported by banks 
themselves. Proposals to this effect are already on the table at the Federal Financial 
Supervisory Authority (BaFin). 42 It would be worth examining whether the minimum 
requirements for the IRB approach could make an additional contribution. These 
minimum requirements already pursue a comprehensive quantitative and qualitative 
approach to validation, though it may not be possible to apply a number of problems 
needing to be resolved to the area of market risk. 43 


4.7 Quantification and Capitalisation of Model Risk 

A further approach might be to quantify and capitalise model risk either in the form 
of a capital surcharge on model results under pillar 1 or as an additional risk category 
under pillar 2. 

It would be worthwhile discussing the idea of using the diverging result inter- 
val of the hypothetical portfolio exercise (see Sect. 2.2) as a quantitative basis for 
individual capital surcharges. This may be regarded as prudential benchmarking 44 
The portfolios tested in this exercise do not, however, correspond to banks’ real 
individual portfolios, which makes them a questionable basis for individual capital 
surcharges. As explained above in Sect. 2.2, moreover, it cannot be concluded that 
the differences are largely due to model weaknesses. The question of how to derive 
the differences actually due to model risk from the observed “gross” differences is 
yet to be clarified and will probably be fraught with difficulties. What is more, model 
risk is not reflected solely in the differences in model results (see below on the nature 
of model risk, which also covers the inappropriate use of model results, for example, 
which can result in flawed management decisions). 

This raises the question as to whether it may be better to address model risk under 
pillar 2. If model risk is assumed to arise, first, when statistical models are not used 
properly and, second, from an inevitable uncertainty surrounding key features of 
models, then it is likely to be encountered above all in the areas of 

• design (model assumptions concerning the distribution of market risk parameters 
or portfolio losses, for example), 

• implementation (e.g. the approximation assumptions necessary for IT purposes), 

• internal processes (e.g. complete and accurate coverage of positions, capture of 
market data, valuation models at instrument level [see below]) and IT systems 
used by banks to estimate risk, and 


42 Cf. [9], pp. 38-49. 

43 Cf. Articles 174, 185 CRR [29]. 

44 The EBA is currently preparing a regulatory technical standard to this effect under Article 78 of 
CRD IV. 
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• model use. 45 

The authors take the view that solving the question of how to quantify model risk for 
the purpose of calculating capital charges is a process very much in its infancy and 
that it is consequently too soon for regulatory action in this field. As in other areas, 
risk- sensitive capital requirements should be sought; one-size-fits-all approaches, 
like that called for by the Liikanen Group, should not be pursued because they 
usually end up setting perverse incentives. 

This point notwithstanding, there are already rigid capital requirements for trad- 
ing activities under pillar 1 which address model risk, namely in the area of prudent 
valuation. These require valuation adjustments to be calculated on accounting mea- 
surements of fair value instruments (additional valuation adjustments, AVAs) and 
deducted from CET1 capital. This creates a capital buffer to cover model risk associ- 
ated with valuation models at instrument level (see above). 46 Valuation risk arising 
from the existence of competing valuation models and from model calibration is 
addressed by the EBA standard. Deductions for market price uncertainty (Article 8 
of the EBA RTS) can also be interpreted as charges for model risk, even if the EBA 
does not itself use the term. 


4.8 Voluntary Commitment by Banks to a Code of “Model 
Ethics” 


A commitment could be made to refrain from aggressive or inappropriate modelling 
with the sole aim of minimising capital requirements. Banks voluntarily exclude 
portfolios, such as certain (though by no means all) securitisation portfolios, from the 
scope of their model if questionable results tend to be generated. This may be regarded 
as a subitem of the modelling validation issue. The desk approach under Basel 3.5 
will help to put this new culture into practice. Since capital requirements will have 
to be calculated using the standardised approach as well as the IMA, any aggressive 
modelling should be exposed. At a minimum, banks will have to demonstrate that 
the standardised approach overstates risk in the portfolio in question. If this cannot 
be demonstrated, a case of excessively aggressive modelling may be assumed. 

4.9 Other Approaches 

Other approaches to restoring confidence also deserve a brief mention: 

• further incentives to use models appropriately 

• opening up of access to trade repository data 

• review of models by auditors 


45 Cf. [28], pp. 20-23. 

46 Cf. [15], p. 20, Art. 11. 
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• more stringent new product introduction (NPI) processes. 

In addition to the code of “moral ethics” discussed in Sect. 4.8, the following addi- 
tional incentive to use models appropriately could be considered. Establishing a link 
between traders’ bonuses and model backtesting results could serve to improve the 
alignment of interests. This idea is also closely connected with the issue of strength- 
ening the use test concept (see Sect. 4.5). 

Trade repositories already collect key data, including calculated market values, 
relating to all derivative contracts, irrespective of whether they are centrally cleared 
or not. As things stand, banks have no way of accessing the data of other banks. If 
access were made possible at an anonymised level, for example, banks would be able 
to carry out internal benchmarking, which could reduce valuation uncertainty and 
thus model risk (see also Sect. 4.7). 

External auditors already review banks’ internal models (both instrument and 
stochastic) when auditing the annual accounts. Ways could be explored of further 
improving or extending this process, e.g. to include a review of use test compliance. 

In the insurance industry, the chief actuary is personally responsible for the correct 
pricing of new products. This practice could be adopted in the NPI process used in 
the banking industry. The CRO would then be responsible for pricing products fairly, 
including products aimed at retail clients. The NPI process could also be made stricter 
by requiring external reviewers to approve major new products. Finally, the suitability 
of proxy approaches, which are extremely important in the NPI process, could be 
examined more stringently and in greater depth. 


5 Conclusion 


The key conclusions of this article can be summarised as follows: 

• A risk-sensitive and model-based approach to calculating capital requirements for 
banks should be retained. 

• Not only should model-based approaches be formally retained, but there should 
also continue to be a capital incentive to use these approaches (i.e. no overriding 
leverage ratio, no floor set at too high a level). 

• Non-risk- sensitive approaches to calculating capital requirements should, at most, 
be used in a complementary capacity, serving merely as indicators and not as 
binding limits. Otherwise, dangerous perverse incentives will arise. 

• There are also dangers associated with risk- sensitive standardised approaches 
because these typically overestimate or underestimate the actual risk. 

• Variation in the area of models is something that needs to be lived with to a certain 
extent. Some standardisation is nevertheless possible, as are other ways of restoring 
confidence. But it should not compromise the internal usability of models. 


36 


U. Gaumert and M. Kemmer 


Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 


References 


1. Artzner, P., Delbaen, F., Eber, J.M., Heath, D.: Coherent measures of risk. Math. Financ. 9(3), 
203-228 (1999) 

2. Basel Committee on Banking Supervision: Consultative document — Fundamental review of 
the trading book (2012) 

3. Basel Committee on Banking Supervision: Consultative document — Fundamental review of 
the trading book: A revised market risk framework (2013) 

4. Basel Committee on Banking Supervision: Consultative document — Revised Basel III leverage 
ratio framework and disclosure requirements (2013) 

5. Basel Committee on Banking Supervision: Discussion paper — The regulatory framework: bal- 
ancing risk sensitivity, simplicity and comparability (2013) 

6. Basel Committee on Banking Supervision: Regulatory consistency assessment programme 
(RCAP) — Analysis of risk- weighted assets for market risk (2013) 

7. Basel Committee on Banking Supervision: Regulatory consistency assessment programme 
(RCAP) — Second report on risk- weighted assets for market risk in the trading book (2013) 

8. Basel Committee on Banking Supervision: Basel III leverage ratio framework and disclosure 
requirements (2014) 

9. Bongers, O.: Mindestanforderungen an die Validierung von Risikomodellen. In: Martin, R.W., 
Quell, P., Wehn, C.: Modellrisiko und Validierung von Risikomodellen, pp. 33-64. Cologne 
(2013) 

10. Bundesverband deutscher Banken (Association of German Banks): Discussion paper: 
Finanzmarktturbulenzen — Gibt es Weiterentwicklungsmoglichkeiten von einzelnen Methoden 
im Risikomanagement? (2008) 

11. Deutsche Bank AG: 2007 to 2013 Annual Reports 

12. Deutsche Bank AG: Investor Relations, presentation for Deutsche Bank analysts’ conference 
call on 31 January 2013, https://www.deutsche-bank.de/ir/de/images/Jain_Krause_4Q2012_ 
Analyst_call_31_Jan_2013_final.pdf 

13. Enhanced Disclosure Task Force: Enhancing the Risk Disclosures of Banks — Report of the 
Enhanced Disclosure Task Force (2013) 

1 4 . European B anking Authority (EB A) : EB A Guidelines on the Incremental Default and Migration 
Risk Charge (IRC), EBA/GL/2012/3 (2012) 

15. European Banking Authority (EBA): EBA FINAL draft Regulatory Technical Standards on 
prudent valuation under Article 105(14) of Regulation (EU) No 575/2013 (Capital Require- 
ments Regulation — CRR) (2014) 

16. Federal Financial Supervisory Authority (Bundesanstalt fur Finanzdienstleistungsaufsicht — 
BaFin): 2007 Annual Report (2008) 

17. Frenkel, M., Rudolf, M.: Die Auswirkungen der Einfiihrung einer Leverage Ratio als zusat- 
zliche aufsichtsrechtliche Beschrankung der Geschaftstatigkeit von Banken, expert opinion for 
the Association of German Banks (2010) 

18. Gaumert, U.: Finanzmarktkrise — Hohere Kapitalanforderungen im Handelsbuch interna- 
tionaler GroBbanken? In: Nagel, R., Serfling, K. (eds.) Banken, Performance und Finanzmarkte, 
festschrift for Karl Scheidl’s 80th birthday, pp. 1 17-150 (2009) 

19. Gaumert, U.: Pladoyer fur eine modellbasierte Kapitalunterlegung. In: Die Bank, 5/2013, pp. 
35-39 (2013) 

20. Gaumert, U., Gotz, S., Ortgies, J.: Basel III — eine kritische Wiirdigung. In: Die Bank, 5/2011, 
pp. 54-60 (2011) 


Regulatory Developments in Risk Management . . . 


37 


2 1 . Gaumert, U., Schulte-Mattler, H. : Hohere Kapitalanforderungen im Handelsbuch. In: Die Bank, 
12/2009, pp. 58-64 (2009) 

22. German Banking Industry Committee: Comments on the BCBS Discussion Paper “The regu- 
latory framework: balancing risk sensitivity, simplicity and comparability” (2013) 

23. German Banking Industry Committee: Position paper “Standardisierungsmoglichkeiten bei 
internen Marktrisikomodellen” (2013) 

24. Haldane, A., Madouros, V.: The Dog and the Frisbee. Bank Of England, London (2012) 

25. Hartmann- Wendels, T.: Umsetzung von Basel III in europaisches Recht. In: Die Bank, 7/2012, 
pp. 38-44 (2012) 

26. Hellwig, M., Admati, A.: The Bankers’ New Clothes. Princeton University Press, Princeton 
(2013) 

27. Kreditwesengesetz: Gesetz iiber das Kreditwesen — KWG, non-official reading version of 
Deutschen Bundesbank, Frankfurt am Main, as at 2 January (2014) 

28. Quell, P: Grundsatzliche Aspekte des Modellrisikos. In: Martin, R.W., Quell, R, Wehn, C.: 
Modellrisiko und Validierung von Risikomodellen, pp. 15-32. Cologne (2013) 

29. Regulation (EU) No 575/2013 of the European Parliament and of the Council of 26 June 2013 on 
prudential requirements for credit institutions and investment firms and amending Regulation 
(EU) No 648/2012 (CRR — Capital Requirements Regulation) 

30. Senior Supervisors Group: Observations on Risk Management Practices during Recent Market 
Turbulence (2008) 

31. Wissenschaftlicher Beirat beim BMWi (Academic advisory board at the Federal Ministry for 
Economic Affairs and Energy): Reform von Bankenregulierung und Bankenaufsicht nach der 
Finanzkrise, Berlin, report 03/2010 (2010) 

32. Zimmermann, G., Weber, M.: Die Leverage Ratio — Beginn eines Paradigmenwechsels in der 
Bankenregulierung? In: Risiko Manager 25/26, pp. 26-28 (2012) 


Model Risk in Incomplete Markets with Jumps 


Nils Detering and Natalie Packham 


Abstract We are concerned with determining the model risk of contingent claims 
when markets are incomplete. Contrary to existing measures of model risk, typically 
based on price discrepancies between models, we develop value-at-risk and expected 
shortfall measures based on realized P&L from model risk, resp. model risk and 
some residual market risk. This is motivated, e.g., by financial regulators’ plans to 
introduce extra capital charges for model risk. In an incomplete market setting, we 
also investigate the question of hedge quality when using hedging strategies from a 
(deliberately) misspecified model, for example, because the misspecified model is 
a simplified model where hedges are easily determined. An application to energy 
markets demonstrates the degree of model error. 


1 Introduction 


We are concerned with determining model risk of contingent claims when mar- 
ket models are incomplete. Contrary to existing measures of model risk, based on 
price discrepancies between models, e.g., [8, 26], we develop measures based on 
the realized P&L from model risk. This is motivated by financial regulators’ plans 
to introduce extra capital charges for model risk, e.g., [5, 13, 17]. In a complete 
and frictionless market model, the “residual” P&L observed on a perfectly hedged 
position is due to pricing and hedging in a misspecified model. The distribution of 
this P&L can therefore be taken as an input for specifying measures of model risk, 
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such as expected loss, value-at-risk, or expected shortfall, [10]. In an incomplete 
market, model risk cannot be entirely isolated from market risk by hedging, and fur- 
ther, it is not a priori clear, which hedging strategies are most effective under model 
uncertainty. The purpose of this paper is to investigate these questions. 

The analysis in [10] is primarily focussed on complete and frictionless market 
models, as this allows for a convenient separation into P&L from market risk and 
P&L from model risk: Since market risk is hedgeable, any remaining P&L is due to 
pricing and hedging in a misspecified model. In the setting of incomplete markets, 
one would rather distinguish between hedgeable and unhedgeable ( or residual ) P&L, 
expressing that the unhedgeable P&L refers to model uncertainty and some unhedged 
market risk. However, from a practical perspective, as an institution needs to take 
care of both market risk and model risk — either through hedging or through capital 
requirements — the distinction is of minor importance. 

In addition, the determination and choice of effective hedging strategies in incom- 
plete markets is not as straightforward as the replicating argument in a complete 
market, but is of high practical relevance. The techniques developed in this paper are 
suitable to comparing the effectiveness of hedging strategies in incomplete markets 
under model uncertainty. 

Model risk is associated with uncertainty about the model or probability measure 
that governs the probabilistic behavior of unknown outcomes. In this context, uncer- 
tainty refers to uncertainty in the Knightian sense , e.g., [16, 23], in which case the 
model uncertainty or model ambiguity is expressed by a set of probability measures, 
each of which defines a valid pricing and hedging model. 

A set of axioms for measures of model risk, in the spirit of coherent and convex 
risk measures [1, 18], was put forward by [8]. A popular measure fulfilling these 
axioms is a contingent claim’s price range across the set of models expressing the 
model uncertainty. This measure is generalized by [2] to account for a distribution 
on the model set. It thus allows to incorporate the likelihood of the models into the 
price range and as such to derive value-at-risk and expected shortfall type measures. 
However, these measures do not account for the potential losses from model risk 
realized when hedging in a misspecified model. In a complete market setup, [10] 
develop value-at-risk and expected shortfall measures on the distribution of losses 
from model risk, and show that these measures fulfill the axioms for model risk (with 
the usual exception of value-at-risk not being subadditive). 

As a generalization of [10], we develop measures for unhedged risk in incomplete 
markets, comprising both market and model risk. This applies, for example, when 
asset price processes are subject to jumps under the pricing measure, where, if at 
all, perfect replication of contingent claims is possible only under conditions not 
met in practice (such as infinitely many hedging instruments). Furthermore, in an 
incomplete market setting, we investigate the question of hedge quality when using 
hedging strategies from a (deliberately) misspecified model, for example, because 
the misspecified model is a simplified model where hedges are easily determined. 
A typical case could be to use a simplified complete market model to determine a 
replication strategy, when it is known that the actual market is incomplete. 
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Several simulation studies investigate the risk from hedging in a simplified model, 
e.g., [11, 24, 25]. However, to the best of our knowledge, this is never compared to 
the residual risk in the alternative model when following a risk-minimizing strategy. 
Yet, this comparison is important for selecting an appropriate model for pricing and 
hedging. 

In a case study, we study the respective loss distributions and measures when 
applied to options on energy futures. Empirical returns in the energy spot and future 
markets behave in a spiky way and thus need to be modeled with jump processes. 
However, to reduce the computational cost and to attain a parsimonious model, often 
simplified continuous asset price processes are assumed. Based on the measures of 
model risk, we assess the quality and robustness of hedging in a continuous asset 
price model when the underlying price process has jumps relative to determining 
hedges in the jump model itself. As asset price models, we employ continuous and 
pure-jump versions of the Schwartz model [27], calibrated to the spot market at the 
Nordic energy exchange Nord Pool. 

The paper is structured as follows: In Sect. 2, we construct the loss variable and loss 
distribution relevant for model risk. Section 3 defines measures on the distribution 
of losses from model risk and relates them to the axioms for measures of model 
uncertainty introduced by [8] . In Sect. 4, we introduce a way of measuring the relative 
losses from hedging in a misspecified model as opposed to hedging in the appropriate 
model. Finally, Sect. 5 contains a case study from the energy market to illustrate the 
relative loss measure and draw conclusions about the quality of hedging strategies 
determined in a complete model with continuous asset price processes, when the 
underlying market is in fact subject to jumps. 


2 Losses from Hedged Positions 

In this section, we formalize the market setup and the loss process expressing the 
residual losses from a hedged position. In the case of a complete and frictionless 
market, these losses correspond to model risk, whereas in the case of an incomplete 
market, these losses comprise in addition the market risk that is not hedged away. 


2.1 Market and Model Setup 

We begin with a standard market setup under model certainty, as in e.g., [22]. On a 
probability space (£?, T , Q) endowed with a filtration (Ti)t > o satisfying the “usual 
hypotheses” are defined adapted asset price processes o, j = 0 , ,d. The 

asset with price process S° represents the money market account, whereas S l , . . . , S d 
are risky assets. All prices are discounted, that is, expressed in units of the money 
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market account, and Q-martingales, with Q a martingale measure equivalent to the 
objective probability measure. 

Throughout we shall assume that S is a Markov process. This applies to many 
models commonly used in practice, such as the Black-Scholes model, exponential 
Levy models, exponential additive models, and stochastic volatility models, such as 
the Heston model. We shall see below that the Markov assumption simplifies the 
analysis considerably. 

Fixing a time horizon T, we consider European-type claims with Tj -measurable 
integrable payoff. Other claims, in particular, path-dependent options, such as Barrier 
options, can be integrated into the analysis; we refer to [10] for the more general 
case. 

In addition to the risky assets S = (S' 1 , ... , S d ), there may be tradeable options 
maturing at T written on S, with observable market prices at time 0, so-called bench- 
mark instruments. Their Tj -measurable payoffs are denoted by ( Hi)i e j , and their 
observed market prices by Cf, i e /, or by [Cf ld , Cf sk ], i e /, if no unique price is 
available. These benchmark instruments can be used for static hedging, potentially 
reducing a claim’s model risk. 

A trading strategy is a predictable process 0 = (0°, . . . , 4> d , u \ , . . . , w/), where 
0 7 = (0/ ) t > o denotes the holdings in asset j and m/ gM denotes the static holding of 
benchmark instrument i . The tim z-t value of the portfolio is V t (0) = 2 0/ $t + 

^ 7 =1 UiH l t , with H \ , i = 1, ...,/, the tim e-t prices of the benchmark instruments. 
To rule out arbitrage opportunities, we require that 0 is admissible. Further, 0 is 
assumed to be self-financing, that is, dV t (0) = 2y=i 0/ dS/+X/=i u i > 0. 

A contingent claim with Tj -measurable payoff X is hedgeable if there exists a 
replicating strategy, i.e., a self-financing trading strategy 0 such that Vj(0) = X. 
Hedging eliminates any P&L arising from market risk, and, because of the absence of 
arbitrage opportunities, the claim’s price process and the price of the hedging strategy 
agree for all 0 < t < T . In an incomplete market, in the absence of a replicating 
strategy, losses from market risk may be eliminated or reduced by super-replicating 
strategies, e.g., [14], or by risk-minimizing strategies, e.g., [19, 20], but some P&L 
due to market risk remains. 

Aside from market risk, a stakeholder (trader, hedger, shareholder, regulator) may 
be concerned about model risk when pricing and hedging a contingent claim. Model 
risk refers to potential losses from mispricing and mishedging, because model Q is 
possibly misspecified. This uncertainty regarding model Q is captured by a set Q of 
martingale measures for the asset price processes, e.g., [8, 9], which may incorporate 
uncertainty about both the model type and model parameters. 

Let 


C = 


X e or (St) I supE X 2 < oo| , 
QeQ L J 


be the set of contingent claims under consideration, where we require square- 
integrability, because for claims with finite second moments quadratic minimizing 
hedging strategies exist, which will be employed later. The set of trading strategies 
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considered is 


5 = 


<P\ 0 admissible, predictable, self-financing, & t e cr(S t ), W > 0 


and E 

- T 

[ (cpi) 2 d\Si,S'\ 

< oo, j = 0, . . . , d 


J 

Lo J 



The condition 0 t e cr(S t ) implies that the hedging strategy is a Markov process. 

Working on a set of measures requires further conditions, in particular, as the 
measures in Q need not be absolutely continuous with respect to Q. More specifi- 
cally, the asset price processes must be consistent under all measures and specifying 
trading strategies requires the notion of a stochastic integral with respect to the set 
of measures. 

In case the models in Q are diffusion processes, [28] develop the necessary tools 
from stochastic analysis, such as existence of a stochastic integral, martingale rep- 
resentation, etc. Although this restricts the joint occurrence of certain probability 
measures, it does not exclude any particular measure. For our purposes, this limi- 
tation does not play a role, as we are primarily interested in choosing a rich set of 
possible models to cover the model uncertainty. For details, we refer to [10]. 

In the general case, we pose the following condition on the set of measures Q, 
which ensures that all objects are well defined when working with uncountably many 
measures. 

Assumption 1 There exists a universal version of the stochastic integral Jq 0 dS, 
0 e S. In addition, for all Q e Q, the integral coincides Q-a.s. with the usual 
probabilistic construction and f Q 0 d S is ^-measurable. 


2.2 Loss Process 

Consider a short position in a claim X e C and a trading strategy <P e S. The time-T 
loss of X that we consider is given by 


L r (X,<2>):=-(V r (0)-T), (1) 

where V T (4>) = W ((</>, 0, , 0)) and Y = X — £/=i «,//,. If Q calibrates to the 
market prices of the benchmark instruments, i.e., E [Hi] = C*, i = 1 ,...,/, then 
Lt(X, 0) = -( V0 (0) - X), which corresponds to the overall realized loss from the 
position. However, if Q does not calibrate perfectly to the benchmark instruments, 
then there is additional instantaneous P&L at time 0 from trading the benchmark 
instruments. This is not included in Eq. (1), and will be ignored in what follows, as 
this is booked as (sunk) trading cost and as is does not give rise to further risks. 
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The goal will be to extend this variable to a loss process L t (X, 0), t < T, with 
0 a hedging, resp. replicating strategy under Q. As both the tim z-t price, E[7| J 7 ] 
and the strategy 0 are defined only Q-a.s., one must be explicit in specifying the 
version to be used when dealing with models that are not absolutely continuous with 
respect to Q. In our setup, we have E[7|^] = E[7|Sf] = f(S t ) for some Borel- 
measurable function /, and likewise for the trading strategy. Since Q expresses the 
model uncertainty when employing Q for pricing and hedging, it must not be involved 
in the choice of the respective versions of the pricing and hedging strategies. 

Assumption 2 The versions of E[7 \S t ],t < T, and 0 are chosen irrespective of the 
measures contained in Q. 

We further impose linearity conditions on the versions of E[ 7 |JT ? ] and 0, which 
are in general only fulfilled Q-a.s. but for all practically relevant models and claims 
hold for all co e £2. This will be important for the axiomatic setup in Sect. 3.2. 

Assumption 3 Let X\, X 2 e C, 0 1 = (0i, u \, . . . , u]), 02 = (02, u \, . . . , u 2 ) e 
S and define Yj := Xj — u { j = 1, 2. For all t < T, it holds that 

E[aYi + bY 2 \J 7 t Kco) = aE[Y\ \T t ](co) + bE[Y 2 \T t ](a)) , a, b e M, co e £2 

and 

Vtiacpiico) + bfoito)) = aV t ((/)i((jo)) + bV t ((/) 2 (<jo)), a,beR, coefi. 
E[7i|Sy](cu) = Y\(co), co g Q. 

Assumptions 2 and 3 will be fulfilled in typical cases relevant in practice. Suppose 
for example that S' is a Black-Scholes model under Q. Then prices and the replicating 
strategy of European payoffs can be determined via the Black-Scholes PDE, and 
these are suitable versions fulfilling the assumptions. 

Definition 1 Let X e C and 0 = (0, u\, . . . , ui) e S. The loss process associated 
with a short position in X and the trading strategy 0 is given by 

L t := L t (X, 0) = -(V,(0) - E[7|S,]) 

d t 

= -(Vo + ^jVd^ -E[F|S,]), 0 <t<T, (2) 

j= 1° 


with Y = X - ZLi u i H i and y o = E [Y]. 

If 0 is a replicating strategy under Q, then L t = 0 Q-a.s., but possibly for some 
Q g Q, Q (L t =0) < 1, which expresses that 0 fails to replicate X under Q. A 
model-free hedging strategy is defined as follows: 

Definition 2 The trading strategy 0 = (( 4>t)o<s<r , Mi,...,M/)isa model-free or 
model-independent replicating strategy for claim X with respect to Q, if L t = 0, 
t > 0, Q-a.s., for all Q e Q. 


Model Risk in Incomplete Markets with Jumps 


45 


Note that our definition of the hedge error based on a continuous time integral sep- 
arates model risk from a discretization error. When actually calculating the hedge 
error, it is necessary to use a time grid small enough such that the discretization error 
is negligible. 

The following proposition shows that the overall expected loss at time T from 
replicating in Q when the market evolves according to Qm instead of Q depends 
only on the price difference. 

Proposition 1 1 . The total expected loss from replicating under Q claim X, that is 
E [Lt] plus the initial transaction cost E[^ 7 =1 Ui ( H 'P — C*)], when the market 
evolves according to Qm is just the price difference in the two models, — (E[X] — 

2. The price range measure, defined by supQ € g E^[X] — infQ e g E^[X], can be 
expressed as supQ q E® [L®], where L ® denotes the loss variable from hedging 
under Q. 

Proof See [10]. 

If a claim cannot be replicated, then — given the static hedging component 
Xi Lt u iHi — a hedging strategy can be defined as a solution (Vo, S) e Rx5 
of the optimization problem 


inf E [U(L t (X,0))\= inf E 
(V 0 <eM, 0 eS) (V 0 eR, 0 eS) 


U 



where U : M —> M + weighs the magnitude of the hedge error. The most common 
choice is U(x) = x 2 , which minimizes the quadratic hedge error. This so-called 
quadratic hedging has the advantage that the resulting pricing and hedging rules 
become linear and it is also the analytically most tractable rule. Under this choice of 
U (x), if S' is a martingale, then a solution exists and Vo = E[T], [20]. 

Of course, in an incomplete market, Lt(X,<P) entails not only losses due to model 
misspecification, but some losses due to market risk as well, since Q(Lt(X, 0) = 
0) < 1, that is, P&L is incurred even when there is no model uncertainty. 

For the explicit determination of L t (X, 0) in some examples, we refer to [10]. It 
is worth noting that in a complete market setup, the loss process corresponds to the 
tracking error of [15]. 

2.3 Loss Distribution 


The next step is to associate a distribution with the loss variable L t , t < T, based 
on which risk measures such as value-at-risk and expected shortfall can be defined. 
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This is achieved by considering an extended probability space {fi, T, P), where 
T now incorporates in addition the model uncertainty and P contains information 
about the degree of uncertainty associated with each model. To make this precise, 
let Q C T be a a -algebra such that conditioning on Q eliminates the uncertainty 
about the pricing measure Q e Q. In this setting, the measures in Q constitute a 
regular conditional probability with respect to Q. For existence and construction of 
this probability space, we refer to [10]. 

In this setup, the models can be indexed by a random variable 0 e G c M, with 
o{6) = Q, so that Q 0 = P(-| cr(6)) and 

P(B) = E[P(fl|cr(0))] = J P(fl|<x(0)) dP = J P(B\9 = a) n(d a), fief, 

Q © 


where /x is the distribution of 6. In particular, losses from hedging in a misspecified 
model under model uncertainty have distribution function 


P (Lt <x) = J <Q>a(L t < x) ix(da), 


0 <t <T. 


The following proposition is proved in [10]. 

Proposition 2 A strategy 0 is a model-free hedging strategy for claim X P -a.s. if 
and only if¥(L t = 0) = 1. 

Hence, model uncertainty is expressed by the unconditional distribution P, 
whereas model certainty is expressed via the conditional distribution P(-| cr(0)). 

A concrete approach to determining the distribution 0 is presented in [10]. Here, 
probability weights are assigned to the models in Q via the Akaike Information 
Criterion (AIC), e.g., [6, 7], which trades off calibration quality against model com- 
plexity. 


3 Measures of Model Risk 

The loss distribution aggregated across the measures in Q from Sect. 2.3 is the key 
input to define measures of model risk. For the time being, we continue to work 
in a setting where a particular model Q is used for pricing and hedging, as this is 
appropriately quantifies the model risk from a bank’s internal perspective. 

If a claim cannot be replicated, and the trading strategy 0 is merely a hedg- 
ing strategy in some risk-minimizing sense, then the loss variable L t (X, 0) from 
Definition 1 features not only model risk, but also the unhedged market risk. To dis- 
entangle model risk from the market risk, one could first determine the market risk 
from the unhedged part of the claim under Q and set this into relation to the overall 
residual risk. This requires taking into account potential diversification effects, since 
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risks are not additive. We shall continue to work under the setup of measuring resid- 
ual risk, and use the terminology “model risk,” although some market risk is also 
present. 

Market incompleteness can also be seen to be a form of model risk, as — in addition 
to the uncertainty on the objective measure — it causes uncertainty on the equivalent 
martingale measure. However, hedging strategies would typically be chosen that are 
risk minimizing not under the martingale measure, but risk minimizing under the 
objective measure. In the case of continuous asset prices, this implies that hedging 
is done under the minimal-martingale measure, which is uniquely determined. In 
practice, it is more common to choose an equivalent measure that calibrates suf- 
ficiently well, and in this case one could argue that incompleteness also increases 
model uncertainty. In our setup, this would be reflected by a larger set Q. 


3.1 Value-at-Risk and Expected Shortfall 

The usual value-at-risk and expected shortfall measures are defined as follows: 

Definition 3 Let L t (X, 0) be the tim e-t loss from the strategy 0 that hedges claim 
X under Q. Given a confidence level a e (0, 1), 

1 . Value-at-risk (VaR) is given by 

VaRaCMX, 0)) = inf {/ e R : P (L t (X, <£)>/)< 1 - a}, 

that is, VaR a is just the a-quantile of the loss distribution; 

2. Expected shortfall (ES) is given by 


In the presence of benchmark instruments, the hedging strategy in model Q may 
not be unique. If the claim X can be replicated, then El — {0 e S : Q(L t (X, 0) = 
0) = 1 , t < T) is the set of replicating strategies for claim X in model Q. Otherwise, 
we focus on quadratic hedging and define El = {0 = (0°, . . . , 0 7 , u\, . . . , uj) e 
S, (mi, . . . , m/) e M 7 : 0 = 0 under Q}, where <P refers to the quadratic risk- 
minimizing strategy attaining the infimum in (3) with U (x) = x 2 . Because in an 
incomplete market, the loss from hedging entails some market risk aside from model 
risk, the benchmark instruments play a more important role than in complete market, 
as they are not necessarily redundant, but may reduce the hedge error under Q. 

To abstract from the particular hedging strategy chosen, we define measures that 
quantify the minimal degree of model dependence, indicating that when pricing and 
hedging under measure Q, the model dependence cannot be further reduced. This is 



a 
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reasonable in the sense that it is not of interest whether a position is indeed hedged 
or not. Rather the hedging argument serves only to eliminate (or reduce, in case the 
claim cannot be replicated) P&L from market risk. Choosing the minimal degree 
allows to appropriately capture claims that can be replicated in a model-free way. 

Definition 4 Concrete measures capturing the model uncertainty when pricing and 
hedging claim X according to model Q are given by 

1. /4qe 4*) = inf* 6 /7E[L r (X, 0) 2 ], 

2. m|r ><M 00 = inf 0 s /7 VaR«(|L f (X, <P)|), 

3. m| s = inf0 S /7 ES 0 (|L t (X, 

4. PvaR = inf <Pe/J max (VaR a(L,(X, <P)),0), 

5 . PES,a,t (X) = inf <2>€/7 max(ES a (L ? (X, £>)), 0). 

The measures //y aR a t and // RS a t capture model uncertainty in an absolute sense, 
and are thus measures of the magnitude or degree of model uncertainty. The measures 

Py aR a t an( * Pi?s a t cons ider losses only. As such, they are suitable for defining a 
capital charge against losses from model risk. 

Contrary to the case of bank internal risk measurement, a regulator may wish to 
measure model risk independently of a particular pricing or hedging measure, taking 
a more prudent approach. To abstract from the pricing measure, one would first 
define the set Qh ^ Q of potential pricing and hedging measures (e.g., measures 
that calibrate sufficiently well) and then define the risk measure in a worst-case sense 
as follows: 

Definition 5 Let /jl® h ( X ) be a measure of model uncertainty when pricing and 
hedging X according to model Qh e Qh • The model uncertainty of claim X is 
given by 


MX)= sup H (X ). (4) 

Qh^Qh 

Capital charges can then be determined from either /^y aR a t (X), resp. /r RS a t (X), 
or from /x Va R , a ,t(X), resp. /x E s 


3.2 Axioms for Measures of Model Risk 

Cont [8] introduces a set of axioms for measures of model risk. A measure satisfying 
these axioms is called a convex measure of model risk. The axioms follow the general 
notion of convex risk measures, [18,21], but are adapted to the special case of model 
risk. In particular, these axioms take into account the possibility of static hedging 
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with liquidly traded option and of hedging in a model-free way. More specifically, 
the axioms postulate that an option that can be statically hedged with liquidly traded 
options is assigned a model risk bounded by the cost of replication, which can be 
expressed in terms of the bid-ask spread. Consequently, partial static hedging for 
a claim reduces model risk. Further, the possibility of model-free hedging with the 
underlying asset reduces model risk to zero. Finally, to express that model risk can 
be reduced through diversification, convexity is required. 

Here we only state the following result, which ensures that our measures fulfill 
the axioms proposed in Cont [8]. The proof is given in [10] for complete markets 
and can be easily generalized to an incomplete market. 

Proposition3 The measures t(^), 1 ^$ a t (X) and Pes a t (X) satisfy the 

axioms of model uncertainty. The measures /Xy aR an d PvaR a *(^0 satisfy 

Axioms 1, 2, and 4. 


4 Hedge Differences 

Instead of considering the P&L arising from model mis specification as in Sect. 2.2, 
one might be interested in a direct comparison of hedging strategies implied by 
different models. For example, one might wish to assess the quality of hedging 
strategies determined from a deliberately misspecified, but simpler model, in a more 
appropriate, but more involved model. 

We first explain the idea with respect to one alternative model Qm ^ Q and outline 
then how measures with respect to the entire model set can be built. As before, Q 
is the model for pricing and hedging and, fixing a claim X e C, 77 is the set of 
quadratic risk-minimizing (QRM) hedging strategies for X under Q (containing 
various hedging strategies, depending on how static hedges with the benchmark 
instruments are chosen). 

We seek an answer to the following question: If the market turns out to follow 
Qm, what is the loss incurred by hedging in Q instead of hedging in Qm? Let 
0 = (0, mi, . . . , uj) e T1 be the QRM strategy for Y = X — 2^=1 u iHi> an d let 
0m be the respective QRM strategy for Y derived under Qm- The relative difference 
of the hedge portfolio compared to the hedge portfolio when using the strategy of 
Qm is given by 


Lf(X,0,0 M ) = 


'[F]-E[F] + ^ / 

j= l o 


( <P J 


M 


-<j> J )dS J . 


( 5 ) 
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This variable differs from L t (X, 0), cf. Eq. (2), in that it expresses the difference 
between the hedging strategies 0 and 0m , whereas L t ( X , 0) describes the difference 
between the hedging strategy 0 and the claim X. 1 * 

The next proposition provides some insight on the different nature of the two 
variables. 

Proposition 4 The following properties hold for the processes L A (X, 0 , 0 m) and 
L(X, 0): 

1. L A (X, 0, <P M ) is a Qm - martingale with L$(X, 0 , 0 M ) = E^U] - E [Y] 

2. E^ M [L r (X, 0)] =£^[7] -E[y] 

3. L a (X, 0, 0m) = Lj(X, 0) Q M~a-s. ifY can be replicated under Qm 

4. Lf(X, 0, 0 M ) ~ L t (X, 0) = E® M [Y\F t ] - E[Y\T t ] Q M -a.s. ifY can be 
replicated under Qm- 

Proof 1. This follows directly from the definition of L A (X, 0, 0m) and the fact 
that 0 M and 0 are in S. 

2. See Proposition 1 . 

3. If Y can be replicated, then Y = E^ M [7] + f(f 0m QM-a.s., and 

consequently L A (X , 0 , 0m) = Y — (E[7] + fo^ 0 ; d^') QM-a.s.. The 

claim follows by observing that L t (X, 0) = — (E[7] + 2j=i Jcf 0 7 dS 7 — 7). 

4. Using that Lf(X, 0 , 0 M ) = E^[7|^] - (E[Y] + 'Z d j=l 4> j d V Qm- a.s.. 
since 7 can be replicated under Qm, the claim follows with the definition of 
L t (X,0). 


Observe that the variable L t (X, 0 ) is neither a sub-martingale nor a super-martingale 
as shown in the example in [10, Sect. 3.5.]. 

As an example, Fig. 1 shows the distributions of L t (X, 0) and L A (X, 0 , 0m) 
for an at-the-money call option X = (St — K) + with So = K = 1, with expiry 
T in 3 months, at time t = 7/2, dynamically hedged with the underlying asset, 
i.e., 0 = (0), resp. 0 m = (0m)- Under the mis specified model Q, the asset price 
process corresponds to a geometric Brownian motion with 20 % volatility, whereas 
under Qm the asset price process follows a geometric Brownian motion with 25 % 
volatility. The correlation of the two loss variables is 67.97 %. At maturity 7, both 
variables agree. 

Generalizing the relative hedge difference to a set of models is not straightforward, 
as the loss variable L A (X , 0,0m) depends explicitly on Qm and, as such, a version 
of the variable that is valid under all models cannot be constructed. [12] shows how 
a loss distribution under model uncertainty can be constructed, which can then be 
used to define the usual risk measures such as value-at-risk and expected shortfall. 


1 There is no need to pose specific conditions on the version of the hedging strategy 0m chosen, 

since in the following only properties of Lf(X, 0, 0 m) under Qm are analyzed. 
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Loss from hedging Loss from hedging 

Fig. 1 Loss at t = 1/2 T from dynamically hedging an at-the-money call option with a maturity 
T of 3 months based on 10,000 simulations and 1,000 time steps. Left Distribution of L t (X, 0), 
E [L t (X, 0)] = 0.0053. Right Distribution of Lf (X, 0, 0 M )MLf(X, 0, 0 M )] = 0.0099, which 
equals the initial price difference 


5 Application to Energy Markets 

As a real- worked example, we study the loss variables and risks from hedging options 
on futures in energy markets. The spot and future prices in energy markets are 
extremely volatile and show large spikes, and a realistic model for the price dynamics 
should therefore involve jumps. However, continuous models based on Brownian 
motions are not only computationally more tractable, but prevalent in practice. Our 
analysis sheds light on the risks of hedging in a simplified continuous model instead 
of a model involving jumps. 

Assume given a probability space (£?, (J ^t)o<t<T ) with a measure P on which a 
two-dimensional Levy process ( L t ) = (L\ , t , L2,t)t>0 with independent components 
is defined. A popular two-factor model for the energy spot price is developed by 
Schwartz and Smith [27]. The spot is driven by a short-term mean reverting factor to 
account for short-term energy supply and energy demand and a long-term factor for 
changes in the equilibrium price level. In its extended form, [4, Sect. 5], the logarithm 
of the spot price is 

lo gS t = A t + X t + Y t (6) 

with (A t ) t> o a deterministic seasonality function, (X t ) t> o a Levy driven Omstein 
Uhlenbeck process with dynamics dX t = —XX t d t + dLi^ and (Y t ) t> o defined by 
d Y t = dL 2 ,r . We further assume that the cumulant function (z) := log(E[^ z,Ll ^]) 
is well defined for z = (zi, zi) £ P 2 , |z| < C, for C e PL Due to the independence 
of L i and L 2 , the cumulant transforms of both processes add up and we have ^ (z) = 
tffi(zi) + ^2(Z2) where and ^2 is the cumulant for L\ and L 2 , respectively. 

We consider the pricing and hedging of options on the future contract. In contrast 
to, for example, equity markets, the future contract in energy delivers over a period 
of time [T \ , T^X instead of a fixed time point by defining a payout 
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Ti 

1 [ S r dr (7) 

T 2 - Ti J 

T\ 

in return for the agreed future price. While the spot is not tradable due to lack of 
storage opportunities, the future is tradable and used for hedging both options on the 
future itself and options directly on the spot price. Assuming that the future price F t 
equals its expected payout 


Ft = 


-d-f V 

t 2 -tJ 


d r\T t 


( 8 ) 


under a pricing measure Q ~ P, the value F t is derived in analytic form in [4]. Under 
the assumption that L\ and L 2 are normal inverse Gaussian (NIG) distributed Levy 
processes an approximate process ( F t L ) t< T 2 is determined in [4] by matching first 
and second moments such that ( F t L ) t< r 2 is of exponential additive type. We assume 
in this application that Q = P. The value of F t L is then 


F t = F 0 exp - 


i 

J (xf(s)) + ^ 2 (^2 (■*)) + J 


0 


27f( i )dLi, i + J Z%(s)dL 2 ,s 


( 9 ) 


with time-dependent, deterministic functions (t) and (r). The process F L 
depends on the interval [T \ , T 2 ], but in order to avoid overloading the notation and 
since we shall only consider a single delivery period in our example, we simply 
write F l , and • The market under this model is incomplete and claims can in 
general only be hedged with risk-minimizing strategies. Integral representations for 
prices and quadratic risk-minimizing hedge positions of call and put payoffs can be 
derived, and we refer the reader to [4, Prop. 3. 9.] for further details and the explicit 
formulas. 

As a pricing and hedging model, we consider a simplified version of (6), which 
is driven by two (nonstandard) independent Q-Brownian motions (#i,^>o and 
(B2,t)t>0 defined on (£?, ( F’ t )o<t<r ) and we derive, again by moment matching, 
an analog approximate future price process F B of the form 


/ 

7 


v? ((X'f (^)) + ((rf (^)) d s + 


* 1 \ 

J E?(s)dB hs + J S$(s)&B 2 A ( 10 ) 


with time-dependent, deterministic functions U B (, t ) and U B (0 and with (z) and 
*F B (z) being the cumulant transforms of B\ \ and #2,1- 
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Although the model has two sources of randomness, it is a complete model under 
the filtration generated by the future price itself as the next proposition shows, which 
means that all practically relevant claims can be replicated. 

Proposition 5 Let (Gt)t<T be the filtration generated by F B up to time t, i.e., Q t := 
o{F? , s < t}. Then the market consisting of F b and a constant riskless bank account 
is a complete financial market with respect to (G t )t<T- 

Proof See [12]. 

We estimate the parameters for both models based on future and spot data from 
Nord Pool energy exchange. We use average daily system peak load electricity 
spot prices for the period from January 2011 until May 2013 (prices as shown 
on Bloomberg page “ENOSOSPK”) and weekday prices for front month and sec- 
ond month future contracts. For details on the estimation procedure, we refer to 
[4, Sect. 5.2.]. In Table 1, we collect the parameter estimates for the two factors of 
both models, the simplified model with two nonstandard Brownian motions and the 
model with two independent NIG-Levy processes. The estimates for the Brownian 
factor are only the drift term p and the volatility term o . The NIG distribution is a 
four-parameter distribution with scale parameter 8, tail heaviness a , skew parameter 
/3, and the location parameter v, see [3]. 

Figure 2 shows the empirical return distributions of both factors together with the 
density function of the estimated distribution. It is obvious that the NIG distribution 
provides a significantly better fit to the empirical returns than the normal distribution. 

The claim to be hedged is an option on a future with a one- week delivery period 
trading one month prior to expiry, so that T\ = 23 and — 30. Based on the 
parameter estimates, we determine scaling terms (t) and (0 for the dynamics 
of F t L and scaling terms X B (t) and X B (t) for the dynamics of F B , respectively. 
Assuming that the measures Q and Q are orthogonal, we define an aggregating 
process F such that F = F B Q-a.s. and F = F t L Q-a.s.. Pricing and hedging is 
performed under Q, and there is only one alternative measure, denoted by Q. Our 
model set is thus Q = {Q, Q}. Applying the Akaike Information Criterion (AIC), 
we assign a probability distribution to the model set Q. It turns out that model Q gets 
assigned a probability of basically 1 due to its much better fit of the returns and we 
simulate according to this model. 

We consider an at-the-money call option X := (Ft 2 — To) + and calculate the 
hedge positions implied by Q. For the simulation of the process under Q, we use 600 
time steps in order to reduce the discretization error. We investigate the distribution 
of Lj(X, 0, 0 q) and Lt(X , 0 ), with 0 and 0q dynamic hedging strategy as there 
are no benchmark instruments. As implied by Proposition 5, the hedging strategy is 
actually a perfect hedge under the model Q. 

Figure 3 shows on the left-hand side the distributions under Q of Lt(X, 0) and 
Lj ( X , 0 , 0q). To compare, Fig. 3 shows the distribution under Q of the hedge error 
Lt(X, 0^) when hedging under Q (top right). Here, the hedge error is introduced 
by market incompleteness. 
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Table 1 Estimated parameters for the NIG distributions of L\ t and L2,t and parameters for the 
normal distributions of B\ t and 



a 

4 

V 

8 

L 2 ,i 

1.9240 

- 0.8860 

0.0176 

0.0622 

L U 

33.3008 

- 1.0988 

- 0.0009 

0.0071 


d 

/X 



Bl,t 

0.2328 

- 0.0004 



Bu 

0.0133 

0.0002 






Fig. 2 Empirical distributions of long-term factor (left) and short-term factor (right) together with 
fitted NIG distribution (solid line) and normal distribution (dashed line) 


It turns out that the loss due to the misspecified model Q is minor compared to the 
loss due to the incompleteness. The loss due to model mis specification as measured by 

Lj(X, 0, 0q) has a mean-squared value of = E®[(L$(X, 0, 0q)) 2 ] = 
9.50. The mean-squared hedge error from hedging under the misspecified model is 

greater with = E^[(L 7 (X, 0)) 2 ] = 34.61. Although the magnitude 

appears high, it is relativized by the fact that even under correct model specifica- 
tion the mean-squared hedge error E®[(L^(X, 0q)) 2 ] is 25.54. The initial prices 

under the two models are E®[X] = 10.954 and E^[X] = 8.068, respectively. 
If we consider the variance of the loss variables, which corrects for the mean, it 
turns out that the impact from the misspecified hedge is rather low. For the variable 
Lj(X, 0, 0 q), we get Var(L^(X, 0, 0q)) = 1.07. We find that Var(L^(X, 0)) 
and E^[(Lt(X, 0q)) 2 ] = Var(Lr(X, 0q)) are similar with 25.71 and 25.56, 
respectively. The lower right of Fig. 3 shows a scatter plot of (Lt(X, 0) and 
Lj(X, 0 q). The two variables show a correlation of 97.91 %, implying a strong 
linear dependence between the hedge error under model Q (market risk) and the 
hedge error due to using the misspecified model Q. 

The fact that the impact due to hedging in the wrong model is relatively low in 
this case study should not be misinterpreted. It confirms a stylized fact that is well 
known for diffusion processes (see [15]), namely that, hedging is robust, as long as 
the overall variance of the underlying is described sufficiently well by the model. 
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Hedge error 



Fig. 3 Upper left Q(L r (X, <P) < •). Upper right Q(L T (X, <P Q ) < •). Lower left Q(Z,£(X, 0. 
<Pq) < •)• Lower right Scatter plot of Lt(X, <P) and Lt(X, (Pq) 


The overall volatility in our setup is the same for both models due to the moment 
matching procedure and uncertainty in this volatility is likely to result in greater 
model risk. The study makes also clear that the hedging error due to incompleteness 
cannot be neglected. 


Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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Bid- Ask Spread for Exotic Options 
under Conic Finance 


Florence Guillaume and Wim Schoutens 


Abstract This paper puts the concepts of model and calibration risks into the 
perspective of bid and ask pricing and marketed cash-flows which originate from 
the conic finance theory. Different asset pricing models calibrated to liquidly traded 
derivatives by making use of various plausible calibration methodologies lead to 
different risk-neutral measures which can be seen as the test measures used to assess 
the (un)acceptability of risks. 

Keywords Calibration risk • Model risk • Exotic bid-ask spread • Conic finance • 
Metric-free calibration risk measure 


1 Introduction 

The publication of the pioneering work of Black and Scholes in 1973 sparked off 
an unprecedented boom in the derivative market, paving the way for the use of 
financial models for pricing financial instruments and hedging financial positions. 
Since the late 1970s, incited by the emergence of a liquid market for plain- vanilla 
options, a multitude of option pricing models has seen the day, in an attempt to 
mimic the stylized facts of empirical returns and implied volatility surfaces. The 
need for such advanced pricing models, ranging from stochastic volatility models to 
models with jumps and many more, has even been intensified after Black Monday, 
which evidenced the inability of the classical Black-Scholes model to explain the 
intrinsic smiling nature of implied volatility. The following wide panoply of models 
has inescapably given rise to what is commonly referred to as model uncertainty or, by 
malapropism, model risk. The ambiguity in question is the Knightian uncertainty as 
defined by Knight [17], i.e., the uncertainty about the true process generating the data, 
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as opposed to the notion of risk dealing with the uncertainty on the future scenario of a 
given stochastic process. This relatively new kind of “risk” has significantly increased 
this last decade due to the rapid growth of the derivative market and has led in some 
instances to colossal losses caused by the mis valuation of derivative instruments. 
Recently, the financial community has shown an accrued interest in the assessment 
of model and parameter uncertainty (see, for instance, Morini [19]). In particular, 
the Basel Committee on Banking Supervision [2] has issued a directive to compel 
financial institutions to take into account the uncertainty of the model valuation in 
the mark-to-model valuation of exotic products. Cont [6] set up the theoretical basis 
of a quantitative framework built upon coherent or convex risk measures and aimed 
at assessing model uncertainty by a worst-case approach. 1 Addressing the question 
from a more practical angle, Schoutens et al. [22] illustrated on real market data 
how models fitting the option surface equally well can lead to significantly different 
results once used to price exotic instruments or to hedge a financial position. 

Another source of risk for the price of exotics originates from the choice of the 
procedure used to calibrate a specific model on the market reality. Indeed, although 
the standard approach consists of solving the so-called inverse problem, i.e., quoting 
Cont [7], of finding the parameters for which the value of benchmark instruments , 
computed in the model, corresponds to their market prices , alternative procedures 
have seen the day. The ability of the model to replicate the current market situation 
could rather be specified in terms of the distribution goodness of fit or in terms of 
moments of the asset log-returns as proposed by Eriksson et al. [9] and Guillaume 
and Schoutens [12]. In practice, even solving the inverse problem requires making 
a choice among several equally suitable alternatives. Indeed, matching perfectly the 
whole set of liquidly traded instruments is typically not plausible such that one 
looks for an “optimal” match, i.e., for the parameter set which replicates as well as 
possible the market price of a set of benchmark instruments. Put another way, we 
minimize the distance between the model and the market prices of those standard 
instruments. Hence, the calibration exercise first requires not only the definition of 
the concept of a distance and its metric but also the specification of the benchmark 
instruments. Benchmark instruments usually refer to liquidly traded instruments. In 
equity markets, it is a common practice to select liquid European vanilla options. 
But even with such a precise specification, several equally plausible selections can 
arise. We could for instance select out-of-the-money options with a positive bid price, 
following the methodology used by the Chicago Board Options Exchange (CBOE 
[4]) to compute the VIX volatility index, or select out-of-the-money options with a 
positive trading volume, or ... Besides, practitioners sometimes resort to time series 
or market quotes to fix some of the parameters beforehand, allowing for a greater 
stability of the calibrated parameters over time. In particular, the recent emergence 
of a liquid market for volatility derivatives has made this methodology possible to 
calibrate stochastic volatility models. Such an alternative has been investigated in 
Guillaume and Schoutens [11] under the Heston stochastic volatility model, where 


1 Another framework for risk management under Knightian uncertainty is based on the concept of 
g -expectations (see, for instance, Peng [20] and references therein). 
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the spot variance and the long-run variance are inferred from the spot value of the VIX 
volatility index and from the VIX option price surface, respectively. Another example 
is Brockhaus and Long [3] (see also Guillaume and Schoutens [13]) who propose to 
choose the spot variance, the long-run variance, and the mean reverting rate of the 
Heston stochastic volatility model in order to replicate as well as possible the term 
structure of model-free variance swap prices, i.e., of the return expected future total 
variance. Regarding the specification of the distance metric, several alternatives can 
be found in the literature. The discrepancy could be defined as relative, absolute, or in 
the least-square sense differences and expressed in terms of price or implied volatility. 
Detlefsen and Hardle [8] introduced the concept of calibration risk (or should we say 
calibration uncertainty) arising from the different (plausible) specifications of the 
objective function we want to minimize. Later, Guillaume and Schoutens [10] and 
Guillaume and Schoutens [11] extended the concept of calibration risk to include not 
only the choice of the functional but also the calibration methodology and illustrated 
it under the Heston stochastic volatility model. 

In order to measure the impact of model or parameter ambiguity on the price of 
structured products, several alternatives have been proposed in the financial litera- 
ture. Cont [6] proposed the so-called worst-case approach where the impact of model 
uncertainty on the value of a claim is measured by the difference between the supre- 
mum and infimum of the expected claim price over all pricing models consistent with 
the market quote of a set of benchmark instruments (see also Hamida and Cont [16]). 
Gupta and Reisinger [14] adopted a Bayesian approach allowing for a distribution 
of exotic prices resulting directly from the posterior distribution of the parameter set 
obtained by updating a plausible prior distribution using a set of liquidly traded instru- 
ments (see also Gupta et al. [15]). Another methodology allowing for a distribution 
of exotic prices, but based on risk-capturing functionals has recently been proposed 
by Bannor and Scherer [1]. This method differs from the Bayesian approach since the 
distribution of the parameter set is constructed explicitly by allocating a higher proba- 
bility to parameter sets leading to a lower discrepancy between the model and market 
prices of a set of benchmark instruments. Whereas the Bayesian approach requires 
a parametric family of models and is consequently appropriate to assess parameter 
uncertainty, the two alternative proxies (i.e., the worst-case and the risk-capturing 
functionals approaches) can be considered to quantify the ambiguity resulting from a 
broader set of models with different intrinsic characteristics. These three approaches 
share the characteristic that the plausibility of any pricing measure £1 is assessed 
by considering the average distance between the model and market prices, either 
by allocating a probability weight to each measure which is proportional to this 
distance or by selecting the measures J2 for which the distance falls within the aver- 
age bid-ask spread. Hence, the resulting measure of uncertainty implicitly depends 
on the metric chosen to express this average distance. We will adopt a somewhat 
different methodology, although similar to the ones above-mentioned. We start from 
a set of plausible calibration procedures and we consider the resulting risk-neutral 
probability measures (i.e., the optimal parameter sets) as the test measures used to 
assess the (un)acceptability of any zero cost cash-flow X. In other words, these pric- 
ing measures can be seen as the ones defining the cone of acceptable cash-flows; 
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where X is acceptable or marketed, denoted by X e if its expectation under any 
of the test measures is nonnegative: 

Eg[X] ^OVJe^. 

This allows us to define the cone of marketed cash-flows in a market-consistent way 
rather than parametrically in terms of some family of concave distortion functions as 
proposed by Cherny and Madan [5] . We can even play with the minimum proportion p 
of model prices included within their bid-ask spread in order to change the amplitude 
of the cone of acceptability by requiring that at least \pM ] model prices are within 
their market spread for i? to be included in the set of test measures 

£ e <^#| pf e [bi,at],i = 1, AfJ > [pM~\, 

where Pj®, at, bi, i = 1, . . . , M denote the model price under the pricing measure 
cS, the quoted ask price, and the quoted bid price of the M benchmark instruments, 
respectively. The higher the proportion, the smaller the set of test measures ^ and 
hence, the wider the cone of acceptability. We opt for a threshold expressed as a 
percentage rather than as an average distance since we want our specification to be 
free of any distance metric. Indeed, the set ^ will be built by considering different 
objective functions (expressed as price or implied volatility differences, as absolute, 
relative, or in the least-square sense differences, ...) such that we do not want to 
favor any of these metrics, to the detriment of the others. The impact of model or 
parameter uncertainty on the price of exotic (i.e., illiquid) instruments is then assessed 
by adopting a worst-case approach as in Cont [6] : 

s(p) = max {eP^I - min IeP^I , (1) 

provided that ^ 0; where EP^ denotes the exotic price under the pricing measure 
cS. The model uncertainty can thus be quantified by the bid-ask spread of illiquid 
products. Indeed, the cash-flow of selling a claim with payoff X at time T at its ask 
price is acceptable for the market if E^[a — exp(— rT)X] > 0, Vi2 e i.e., if 
a > exp (—rT) max {E £?[X]}. For the sake of competitiveness, the ask price is set 

at the minimum value, i.e., 

a = exp (—rT) max {E^[X ]} . 

Similarly, the cash-flow of buying a claim with payoff X at time T at its bid price 
is acceptable for the market if E g\—b + exp(— rT)X] > 0, Vi2 e i.e., taking 
the maximum possible value for competitiveness reasons 

b = exp(— rT) min [Eg[X]} . 
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The impact of model uncertainty can be expressed as a function of the severity of the 
percentage threshold p. We note that decreasing the threshold ultimately boils down 
to considering a thinner set of benchmark instruments since the model price has to 
fall within the market bid-ask spread for a smaller number of calibration instruments 
in order for a pricing measure to be selected. In particular, such a relaxation typically 
results in the “elimination” of the most illiquid calibration instruments, i.e., deep 
out-of-the-money options in the case of equity markets (see Fig. 2). 

For the numerical study, we consider the Variance Gamma (VG) model of Madan 
et al. [18] only, although the methodology can be equivalently used to assess cali- 
bration or/and model uncertainty. The calibration instrument set consists of liquid 
out-of-the-money options: moving away from the forward price, we select put and 
call options with a positive bid price and with a strike lower and higher than the 
forward price, respectively, and this until we encounter two successive options with 


volatility, the set of measures results from the following specifications for the 
objective function we minimize (i.e., for the distance and its metric): 

1 . Root-mean square error (RMSE) 

a. price specification 


zero bid. Denoting by P\ = ai ~^ bi the mid-price of option i and by Oi its implied 


M 


RMSE = (P, - Pjf 

\ /=l 


b. implied volatility specification 


M 


RMSE CT = (a, - d,) 2 

\ i = 1 



b. implied volatility specification 



3. Average absolute error (APE) 
a. price specification 
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1 

APE= \P, - P,\ 

b. implied volatility specification 

1 M 

APE* = -*i\, 

a 

i—\ 

where P and a denote the average option price and the average implied volatil- 
ity, respectively. 

Each of these six objective functions can again be subdivided into an unweighted 
functional for which the weight = co = V/ and a weighted functional for which 

the weight coi is proportional to the trading volume of option i . We furthermore con- 
sider the possibility of adding an extra penalty term to the objective function in order 
to force the model prices to lie within their market bid-ask spread. Besides these stan- 
dard specifications (in terms of the price or the implied volatility of the calibration 
instruments), we consider the so-called moment matching market implied calibra- 
tion proposed by Guillaume and Schoutens [12] and which consists in matching the 
moments of the asset log-return which are inferred from the implied volatility sur- 
face. As the VG model is fully characterized by three parameters, we consider three 
standardized moments, namely the variance, the skewness, and the kurtosis. Since 
as shown by Guillaume and Schoutens [12], the variance can always be perfectly 
matched, we either allocate the same weight to the matching of the skewness and the 
kurtosis or we match uppermost the lower moment, i.e., the skewness. This leads to 
a total of 26 plausible calibration procedures, each of them leading to a test measure 
cS £ ./M provided that the proportion of model prices falling within their market 
bid-ask spread is at least equal to the threshold p. 

2 Exotic Bid-Ask Spread 

For the numerical study, we consider daily S&P 500 option surfaces for a timespan 
ranging from October 2008 to October 2009, including therefore, the recent credit 
crunch 2 . We calibrate the VG model daily on the quoted (liquid) maturity which is the 
closest to the reference maturity of three months. Note that we only consider matu- 
rities for which the total trading volume of out-of-the-money options exceeds 1,000 
contracts which allows to avoid the extreme situation of an undetermined calibration 
problem where the number of parameters to calibrate is higher than the number of 
benchmark instruments. This also ensures that the number of option prices is large 
enough (and so the strike range wide and refined enough) to guarantee a sufficient 
precision for the derived market implied moments. For each of the trading days 


2 The data are taken from the KU Leuven data collection which is a private collection of historical 
daily spot and option prices of major US equity stocks and indices. 
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Fig. 1 Maximum proportion n of option prices replicated within their bid- ask spread (upper) and 
option bid-ask spreads (below) 


included in the sample period, we successively perform the 26 calibration method- 
ologies, which leads to 26 optimal parameter sets. We then select those for which 
the proportion of model prices falling within their market bid-ask spread is at least 
p. The higher the threshold p , the fewer the test measures £1 £ and hence, the 
thinner the exotic bid-ask spreads. Figure 1 shows the highest proportion it of option 
prices replicated within their bid-ask spread for the 26 above-mentioned calibration 
procedures: 


it — — max# \pf* e [bi, a{\ , i = 1, . . . , m\ . 

Mg l J 

If it < p , then is an empty set and there does not exist exotic spread for that partic- 

ular threshold p as defined by (1). Hence, when selecting the proportion threshold p , 
we should keep in mind the trade-off between the in- spread precision and the number 
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Number of options replicated within bid-ask spread 
(RMSE price unweighted specification) 


Number of options replicated within bid-ask spread 
(RMSE price weighted specification) 
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Number of options replicated within bid-ask spread 
(RMSE volatility weighted specification) 



Fig. 2 Number of options for which the model price falls within the quoted bid-ask spread 


of test measures. Indeed, the higher the proportion, the higher the precision but the 
fewer the measures selected as test measure, which can in turn lead to an underesti- 
mation of the calibration uncertainty measured as the exotic bid-ask spreads. From 
Fig. 1, we observe that n is significantly higher during the heart of the recent credit 
crunch, i.e., from the beginning of the sample period until mid 2009. This can easily 
be explained by the typically wider bid-ask spreads observed during market distress 
periods. Indeed, as shown on the lower panel of Fig. 1, the quoted spread for at-the- 
money, in-the-money ( K = 0.75 So), and out-of-the-money ( K = 1.25 So) options 
has significantly shrunk after the troubled period of October 2008-July 2009. 

Figure 2 shows the number of vanilla options whose model price falls within the 
quoted bid-ask spread as a function of the option moneyness for four of the calibration 
procedures under investigation, namely the weighted and unweighted RMSE price 
and implied volatility specifications without penalty term. To assess the impact of 
moneyness on the model ability to replicate option prices within their bid-ask spread, 
we split the strike range into 21 classes: < 0.5, 0.5 < < 0.55, 0.55 < < 

0.6, ...,1.45 < f < 1.5, and ^ > 1.5. We clearly see that, at least for the price 
specifications, option prices falling outside their quoted bid-ask spread are mainly 
observed for deep out-of-the-money calls and puts. This trend is even more marked 


Bid- Ask Spread for Exotic Options under Conic Finance 


67 


and present in the implied volatility specifications when we add a penalty term in 
the objective function to constraint the model price within the market spread. Hence, 
increasing the proportion threshold p mainly boils down to limit the set of calibration 
instruments to close to the money vanilla options. 

In order to illustrate the impact of parameter uncertainty on the bid-ask spread of 
exotics, we consider the following path dependent options (with a maturity of T = 3 
months): 

1. Asian option 

The payoff of Asian options depends on the arithmetic average of the stock price 
from the emission to the maturity date of the option. The fair price of the Asian 
call and put options with maturity T is given by 

AC = exp (-rT)E &[( mean St — K ) + l AP = exp (-rT)E K — mean St ) + l . 

0 <t<T ’ J 0 <t<T ' J 


2. Lookback call option 

The payoff of lookback call and put options corresponds to the call and put vanilla 
payoff where the strike is taken equal to the lowest and highest levels the stock 
has reached during the option lifetime, respectively. The fair price of the lookback 
call and put with maturity T is given by 

LC = exp (-rT)E g [(Sr - tn s T ) + j LP = cxp(-rT)E ^ [(Aff - 5 r ) + ] , 

respectively, where mf and M* x denote the minimum and maximum processes 
of the process X = {X t , 0 < t < T}, respectively: 

mf = inf {Xy, 0 < s < t] Mf = sup {X s , 0 < s < t ] . 

3. Barrier call option 

The payoff of a one-touch barrier option depends on whether the underlying 
stock price reaches the barrier H during the lifetime of the option. We illustrate 
the findings by looking at the up-and-in call and the down-and-in put price: 

UIBC = exp(-rT)Eg[(S T - K ) + 1 (m£ > #)] 

DIBP = exp(-r7’)£ = g [(K - 5 r ) + l (m s T < //)] . 


4. Cliquet option 

The payoff of a cliquet option depends on the sum of the stock returns over a 
series of consecutive time periods; each local performance being first floored 
and/or capped. Moreover, the final sum is usually further floored and/or capped 
to guarantee a minimum and/or maximum overall payoff such that cliquet options 
protect investors against downside risk while allowing them for significant upside 
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potential. The Cliquet we consider has a fair price given by 



For sake of comparison, we also price a 3 months at-the-money call option. Note 
that this option does not generally belong to the set of benchmark instruments since, 
most of the time, we can not observe a market quote for the option with the exact 
same maturity and moneyness. 

The path dependent nature of exotic options requires the use of the Monte Carlo 
procedure to simulate sample paths of the underlying index. The stock price process 


Sq exp((r - q)t + X t ) 
Ejg[exp(X t )\ 


X ~ VG(cr, v,0) 


is discretized by using a first order Euler scheme (for more details on the simula- 
tion, see Schoutens [21]). The (standard) Monte Carlo simulation is performed by 
considering one million scenarios and 252 trading days a year. 

The bid and ask prices and the relative bid-ask spread (dollar bid-ask spread 
expressed as a proportion of the mid-price) of different exotic options are shown 
on Figs. 3 and 4, respectively, and this for a proportion threshold p equal to 0.5, 
0.75, and 0.9. For sake of comparison, Fig. 5 shows the same results but for the 
3 months at-the-money call option. The figures clearly indicate that the impact of 
parameter uncertainty is much more marked for path-dependent derivatives than for 
(non-quoted) vanilla options. Indeed, the relative bid-ask spread is of a magnitude 
order at least 10 times higher for the Asian call, lookback call, barrier call, and 
cliquet than for the vanilla call option. Besides, we observe that a far above average 
call relative spread does not necessarily imply a far above average percentage spread 
for path dependent options. In order to assess the consistency of our findings, we 
have reproduced the Monte Carlo simulation 400 times for one fixed quoting day 
(namely October, 1, 2008) with different sets of sample paths and computed the 
option relative spreads for each simulation. Figure 6 shows the resultant histogram 
for each relative spread and clearly brings out the consistency of the results: the 
relative spread is far more significant for the exotic options than for the vanilla 
options whatever the set of sample paths considered. The consistency of the Monte 
Carlo study is besides guaranteed by the fact that we used the same set of sample 
paths to price each option. Table 1 which shows the average price, standard deviation, 
and relative spread (across the 400 Monte Carlo simulations) for the price weighted 
RMSE functional confirms that the exotic bid-ask spreads are due to the nature of 
the exotic options rather than to the intrinsic uncertainty of Monte Carlo simulations. 
Indeed, the Monte Carlo relative spread given in Table 1 is significantly smaller than 
the option spread depicted on Fig. 6, and this for each exotic option. Table 2 shows 
the average of the relative spread over the whole period under investigation, and this 
for the different options under consideration. We clearly observe that the threshold p 
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Fig. 3 Evolution of exotic bid and ask prices through time 
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Relative bid-ask spread for the Asian call option (K = S Q , T =1/4) 
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Fig. 4 Evolution of exotic relative bid-ask spreads (in absolute value) through time 
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Fig. 5 Evolution of vanilla bid and ask prices and relative bid-ask spread (in absolute value) through 
time 


impacts more severely the spread of the path-dependent options. Indeed, decreasing 
p leads to a sharper increase of the relative bid-ask spread for the exotic options than 
for the European call and put options. Besides, the calibration risk is predominant 
for the up-and-in barrier call option and, to a smaller extent, for the Asian options. 
Table 3 shows the 95 % quantile of relative bid-ask spreads. We clearly see that in 
terms of extreme events, the more risky options are the up-and-in barrier call option 
and the lookback options. By way of conclusion, our findings clearly illustrate the 
impact of the calibration methodology on the price of exotic options, suggesting that 
risk managers should take into account calibration uncertainty when assessing the 
safety margin. 
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Fig. 6 Relative bid-ask spreads (in absolute value) for different Monte Carlo simulations 


Table 1 Monte Carlo precision 



Call 

Put 

Asian 

call 

Asian 

put 

Lookback 

call 

Lookback 

put 

UIBC 

DIBP 

Cliquet 

Mean 

77.201 

73.647 

29.270 

27.487 

254.70 

283.32 

43.690 

57.681 

0.0449 

Std 

0.1000 

0.1202 

0.0345 

0.0552 

0.1505 

0.1856 

0.0922 

0.1225 

5E-05 

Rel. spread a 

0.0078 

0.0092 

0.0073 

0.0118 

0.0034 

0.0035 

0.0131 

0.0120 

0.0066 


a The Monte Carlo relative spread is defined as the maximum minus the minimum price divided by 
the average price across the 400 Monte Carlo simulations 


Table 2 Average relative bid-ask spreads (in %) 


P 

Call 

Put 

Asian call 

Asian put 

Lookback call 

Lookback put 

UIBC 

DIBP 

Cliquet 

0.5 

2.59 

2.53 

29.82 

27.41 

17.89 

24.97 

43.80 

7.20 

17.26 

0.75 

1.66 

1.72 

19.81 

18.68 

11.09 

16.78 

22.11 

3.75 

11.50 

0.9 

1.37 

1.46 

12.18 

11.77 

5.97 

9.75 

10.31 

2.44 

6.55 



CD 

CD 

£ 
C 
( D 


CD 

CD 

£ 
C 
( D 
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Table 3 95 % quantile of relative bid-ask spreads (in %) 


p 

Call 

Put 

Asian call 

Asian put 

Lookback call 

Lookback put 

UIBC 

DIBP 

Cliquet 

0.5 

5.75 

5.24 

77.95 

67.56 

79.19 

90.87 

102.32 

28.41 

49.75 

0.75 

3.89 

3.67 

51.99 

51.98 

68.04 

81.13 

72.11 

12.88 

40.64 

0.9 

3.15 

3.26 

40.46 

40.44 

27.28 

43.58 

33.03 

5.06 

24.36 


3 Conclusion 

This paper sets the theoretical foundation of a new framework aimed at assessing the 
impact of calibration uncertainty. The main advantage of the proposed methodology 
resides in its metric-free nature since the selection of test measures does not depend 
on any specified distance. Besides, the paper links the concept of uncertainty and 
the recently developed conic finance theory by defining the test measures used to 
construct the cone of acceptable cash-flows as the pricing measures resulting from 
any plausible calibration methodology such that model and parameter uncertainties 
are naturally measured as bid-ask spreads. The numerical study has highlighted 
the significant impact of parameter uncertainty for a wide range of path-dependent 
options under the popular VG model. 

Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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Derivative Pricing under the Possibility 
of Long Memory in the supOU Stochastic 
Volatility Model 


Robert Stelzer and Jovana Zavisin 


Abstract We consider the supOU stochastic volatility model which is able to exhibit 
long-range dependence. For this model, we give conditions for the discounted stock 
price to be a martingale, calculate the characteristic function, give a strip where it 
is analytic, and discuss the use of Fourier pricing techniques. Finally, we present a 
concrete specification with polynomially decaying autocorrelations and calibrate it 
to observed market prices of plain vanilla options. 

Keywords Calibration • Fourier pricing • Levy basis • Long memory • Superposi- 
tion of Omstein-Uhlenbeck-type processes • Stochastic volatility 
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1 Introduction 

The Ornstein-Uhlenbeck (OU)-type stochastic volatility (SV) model introduced in 
[3] is one of the most popular stochastic volatility models for prices of financial 
assets driven by a Levy process (see, e.g., [11, 25]). It covers many of the stylized 
facts typically encountered in financial data (cf. [10, 14]). Over the years many 
variants have been introduced, for instance a variant with two sided jumps in [1] or 
a multivariate extension in [21]. 
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In this paper, we consider a variant of the model which additionally can cover 
the stylized fact of long-range dependence (or slower than exponentially decaying 
autocorrelations), the supOU stochastic volatility model. In this model, we specify 
the volatility as a superposition of Ornstein-Uhlenbeck (thus “supOU”) processes, 
which have been introduced in [2]. Various features of this volatility model (in a 
multidimensional setting) have been considered in [4, 5, 18, 26]. 

Typically long-range dependence is obtained by using fractional Brownian motion 
or fractional Levy processes as the driving noises, see, e.g., [6, 7] for a critical 
discussion of such models for financial markets. In such models one cannot have 
jumps, as fractional Levy processes (cf. [16]) have continuous paths, and one is 
bound to have long memory. In our supOU model, one has a natural extension of the 
OU-type model that exhibits jumps and, depending on the parameters, can exhibit 
short or long memory. However, our model shares one disadvantage with fractional 
process based models, viz. that it is no longer Markovian. In this context, one should 
bear in mind that most Markov processes one employs to model volatilities are 
geometrically ergodic and thus cannot exhibit long memory, although there exists 
also Markov process with polynomial mixing coefficients and even long memory 
(see, e.g., [27]). 

The focus of the present paper is on derivative pricing in and calibration of the 
univariate supOU SV model similar to the papers [19, 20] in the (multivariate) OU- 
type SV model. To this end, we first briefly review the model in Sect. 2. In Sect. 3, 
we give conditions on the parameters such that the discounted stock price process 
is a martingale which implies that under these conditions the model can be used to 
describe the risk neutral dynamics of a financial asset. Thereafter, we start Sect. 4 
with a review of Fourier pricing. Then, we give the characteristic function of the log 
asset price in the supOU SV model and show conditions for the moment generating 
function to be sufficiently regular so that Fourier pricing is applicable. Finally, we 
present a concrete specification, the U-supOU SV model, in Sect. 5 and discuss its 
calibration to market data which we illustrate with a small example using options on 
the DAX. Finally, we discuss a subtle issue regarding how to employ the calibrated 
model to calculate prices of European options with a general maturity. 


2 A Review of the supOU Stochastic Volatility Model 


We briefly review the definition and the most important known facts of the supOU 
stochastic volatility model introduced in [5]. More background on supOU processes 
can be found in [2, 4, 13, 26]. 

In the following, M_ denotes the set of negative real numbers and ^(R_ x R) 
denotes the bounded Borel sets of M_ x R. 

Definition 2.1 A family A = {A(B) : B e «^&(R_ x R)} of real-valued ran- 
dom variables is called a real- valued Levy basis (infinitely divisible independently 
scattered random measure) on M_ x R if: 
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• the distribution of A (B) is infinitely divisible for all B e ^(R_ x R), 

• for any n e N and pairwise disjoint sets B\, ... ,B n e ^(R_ x R) the random 
variables A(B i), . . . , A(B n ) are independent, 

• for any sequence of pairwise disjoint sets B n e ^(R_ x R) with n e N satisfying 
^neN^n £ «^k(R- xM) the series A(B n ) converges a.s. and A(U ne ^B n ) = 

We consider only Levy bases with characteristic functions of the form 


E(exp (iuA(B))) = exp (cp(u)]l(B)) 


for all u € R and all B e «^,(R_ x R), where 77 = 7r x A is the product of a 
probability measure 7r on R_ and the Lebesgue measure A on R and 


<p(u) = iuyo + 


/ ( e '“ - 0 


v(dv) 


is the cumulant transform of an infinitely divisible distribution on R + with Levy- 
Khintchine triplet (yo, 0, v), which is also the characteristic triplet of the underlying 
Levy process L t = yl(R_ x (0, t]) and L- t = yi(R_ x (—t, 0)) for t e R+ (see, 
e.g., [24] for the relevant background on infinitely divisible distributions and Levy 
processes). We call the triplet (yo, v, tv) the generating triplet. Note that this means 
that yo > 0, v(R\R + ) = 0, and \x\v(dx) < oo. 

If L is apure jump Levy process with triplet (0, 0, v) andjump measure N(ds, dx), 
then turning the Poisson point process of jumps in R x R + \{0} to one in R x R + \{0} x 
R_ by marking all jumps with independent marks distributed according to tc produces 
the jump measure of a Levy basis with triplet (yo, v, n). 

In the supOU process defined now, this can be understood as assigning every jump 
of a Levy process an individual exponential decay rate. We restrict our attention to 
positive supOU processes as this is natural when using them to model a variance 
changing over time. 

Theorem 2.2 ([2, 4, 13]) Let A be an R ^-valued Levy basis on R_ x R with 
generating triplet (yo,v,n). Assume 


ln(|v|)v(dv) < oo, and 


M>1 



—Tt(dA) < oo. 
A 


Then the process U = (^t)teR given by 


Zt = 



M_ -oo 


e AU ~‘ <) A(dA, ds) 


is well defined as a Lebesgue integral for all t £ R and it is stationary. 
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Moreover, U t > 0 for all t G M and the distribution of E t is infinitely divis- 
ible with characteristic function given by = e * W]/i: ,o+/ M+ (e ^) v z(dx) , 

for all u gM where 


for all B g dd(W). 

As shown in [4, Theorem 3.12] the supOU process is adapted to the filtration gener- 
ated by A and has locally bounded paths. Provided Tt has a finite first moment, one 
can take a supOU process to have cadlag paths. 

Definition 2.3 Let IT be a standard Brownian motion, a = (a t ) te ^ + a predictable 
real- valued process, A an M + -valued Levy basis on M_ x M independent of IT with 
generating triplet (yo,v,n) and let L be its underlying Levy process. Let X be a 
non-negative cadlag supOU process and p G M. Assume that the logarithmic price 
process X = (X t ) te R + is given by 


where Xo is independent of A. Then we say that X follows a univariate supOU 
stochastic volatility model and refer to it by SV supOU (a, p, yo, v, Tt). 

In the following, we always use as filtration the one generated by W and A. 

In Definition 2.3 X is supposed to be the log price of some financial asset and 
p is the typically negative correlation between jumps in the volatility and log asset 
prices modeling the leverage effect. To ensure that the absolutely continuous drift is 
completely given by a t , we subtract the drift yo from the Levy process noting that 
this can be done without loss of generality. 

In [5], it has been shown that the model is able to exhibit long-range dependence 
in the squared log returns. The typical example leading to a polynomial decay of 
the autocovariance function of the squared returns and to long-range dependence 
for certain choices of the parameter is to take ix as a Gamma distribution mirrored 
at the origin. [13, 26] discuss in general which properties of n result in long-range 
dependence. 

3 Martingale Conditions 

Now we assume given a market with a deterministic numeraire (or bond) with price 
process z rt for some r > 0 and a risky asset with price process S t . 

We want to model the market by a supOU stochastic volatility model under the risk 
neutral dynamics. Thus, we need to understand when S t = e~ rt e Xt is a martingale 




o 


o 
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for the filtration G = (&t)teR+ generated by the Wiener process and the Levy basis, 
i.e., % — o ({A(A), W s : s g [0, t ] and A e ^(M_ x (— 00 , r])}) for t e M+. 
Implicitly, we understand that the filtration is modified such that the usual hypotheses 
(see, e.g., [22]) are satisfied. 

Theorem 3.1 (Martingale condition) Consider a market as described above. Sup- 
pose that 

J (z px — l) v(dx) < 00 . (1) 

X>1 

If the process a = (a t ) t eR + satisfies 

a, =r-^S t - J (e px - l) v(dx), (2) 

M_|_ 

then the discounted price process S is a martingale. 

Proof The arguments are straightforward adaptations of the ones in [19, Proposition 
2.10] or [20, Sect. 3]. 


4 Fourier Pricing in the supOU Stochastic Volatility Model 


Our aim now is to use the Fourier pricing approach in the supOU stochastic volatility 
model for calculating prices of European derivatives. 


4.1 A Review on Fourier Pricing 

We start with a brief review on the well-known Fourier pricing techniques introduced 
in [9, 23]. 

Let the price process of a financial asset be modeled as an exponential semi- 
martingale S = ( S t )o<t<r , i.e., S t = Soe Xt , 0 < t < T where X = (X t )o < t <r is a 
semimartingale. 

Let r be the risk-free interest rate and let us assume that we are directly work- 
ing under an equivalent martingale measure, i.e., the discounted price process 
S = (S t )o<t<T given by S t = Soe Xt ~ rt is a martingale. 

We call the process X the underlying process and without loss of generality we 
can assume that Xo = 0. We denote by s minus the logarithm of the initial value of 
S , i.e., j = — log(So). 

Let / denote the Fourier transform of the function /, i.e., f(u) = e lux f(x) dx. 
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Let now / : R -> R + be a measurable function that we refer to as the payoff 
function. Then, the arbitrage-free price of the derivative with payoff f(Xj — s) 
and maturity T at time zero is the conditional expected discounted payoff under the 
chosen equivalent martingale measure, i.e., Vf(X t\ s) = e~ rt E (f(Xj — s) |%) . 

The following theorem gives the valuation formula for the price of the derivative 
paying f(Xj — s) at time T. 

Theorem 4.1 ([12] Theorem 2.2, Remark 2.3) Let f : R —> R + be a pay off function 
and let 2 r(x) = e~ Rx fix) for some Re R denote the dampened payoff function. 
Define <P Xt \ % {u) := E ( e uX t \%) , u e C. If 

(i) g R e L*(M) n L°°(R), (if) <P Xt \ %{R) < 00 , (Hi) <P Xt1 % (R + /•) e L*(R), 

f/jen Vf(X T ; s) = - + iu)f(iR - u)du. 

It is well known that for a European Call option with maturity T and strike K > 0 
condition (/) is satisfied for R > 1 and that for the payoff function f(x)= max(£* — 
K, 0) =: (e x — K) + the Fourier transform is f(u) = for u e C with 

Im (u) e (1, oo). 

In the following, we calculate the characteristic/moment generating function for 
the supOU S V model and show conditions when the above Fourier pricing techniques 
are applicable. 


4.2 The Characteristic Function 


Consider the general supOU S V model with drift of the form a t = /z + yo + f X t . 
Note that then the discounted stock price is a martingale if and only if ft = —1/2 
and ii + y 0 = r - / R+ (e px - 1) v(dt). 

Standard calculations as in [19, Theorem 2.5] or [20] give the following result 
which is the univariate special case of a formula reported in [4, Sect. 5.2]. 

Theorem 4.2 Let Xo E R and let the log-price process X follow a supOU SV model 
of the above form. Then, for every t e R+ and for all u e R the characteristic 
function of X t given % is given by 


<t>x,\y o m =m(e iuX ’\ 

= exp |/(w(Xo + fJLt) + (up + iff) J f \ if 


(3) 


/t(dA,ds)) 


' / / 1 v ( L ir ( up + r 2 ) ~ (z ( M/! + r 2 ) pu )) 

it? n ' ' 


d^7r(dA) 
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Note that in contrast to the case of the OU-type stochastic volatility model, where 
(X, X) is a strong Markov process, in the supOU stochastic volatility model X is not 
Markovian. Thus, conditioning on Xo and Xo is not equivalent to conditioning upon 
Sfo- Therefore, <P Xt \% 0 u ) is not simply a function of Xo, Xo. Instead, the whole past 
of the Levy basis enters via the % -measurable 

o 

Zt '■= J j\ (e A(t ~ s) - e ~ As ) A(dA, d s), 

M_ -oo 


which has a similar role as the initial volatility Xo in the OU-type stochastic volatility 
model. Like Xo in the OU-type models, Zt can be treated as an additional parame- 
ter to be determined when calibrating the model to market option prices. We can 
immediately see that thus the number of parameters to be estimated increases with 
each additional maturity. As it will become clear later, the following observation is 
important. 

Lemma 4.3 zt x < Zt 2 , for all t\,t 2 £ M + such that t\ < t 2 - 

Proof For t e M+ and s < t we have j (e A ^~ s ^ — e~ As ) = (e At — l) and 
for t\ < t 2 one sees e Atl — 1 < e Atl — 1 < 0 since A < 0. This implies that for 
s < t\ < t 2 e -^~ (e Atl - 1) < e -^~ ( e At 2 - l) and thus z tl < z t2 . 


4.3 Regularity of the Moment Generating Function 

In order to apply Fourier pricing, we now show where the moment generating function 
®x T \% is analytic. 

Let Ol(u) = you + ( e ux — 1) v(cbc) be the cumulant transform of the Levy 

basis (or rather its underlying subordinator). If f x>1 e rx v(dx) < oo for all r e 
M such that r < s for some s > 0, then the function Ol is analytic in the open set 
S L := [z £ C : Re(z) < s}, as can be seen, e.g., from the arguments at the start of 

the proof of [19, Lemma 2.7]. 

Theorem 4.4 Let the measure v satisfy 


e rx v(dx) < oo for all r g M such that r < s 


x>\ 


(4) 


for somes > 0. Then the function G (u) = Jq Oi(uf u (A, s))ds7r(dA) is analytic 
on the open strip 
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S := {u e C, | Re(u)\ < 5} with S := -\0\ ~ f + VZ, (5) 

where A := (\fi \ + + y. 

The rough idea of the proof is similar to [19, Theorem 2.8], but the fact that we 
now integrate over the mean reversion parameter adds significant difficulty, as now 
bounds independent of the mean reversion parameter need to be obtained and a very 
general holomorphicity result for integrals has to be employed. 

Proof Define 


f,(A. s) = („ + |) - (I (^ + |) - ,)) . (6, 

We first determine 8 > 0 such that for all u e M with \u\ < 8 it holds that 
\uf u (A, 5 -) I < s. We have 


\uf u (A,s)\ < 


e A(t-s) _ i 


(\P\\U\ 


+ 



+ lpll w l 


(7) 


by the triangle inequality. In order to find the upper bound for the latter term, we first 
note that elementary analysis shows 


e A(t-s ) _ i 
A 


< t 


( 8 ) 


for all A < 0 and s e [0, t]. Thus, we have to find 8 > 0 such that \uf u (A, s)| < 
t(mu\ + f) + \ P \\u\ < £,for all u g M with \u\ < 8 , i.e., to find the solutions of 
the quadratic equation 


2 u 2 + + Ip I) \u\ — s — 0. (9) 

Since for u = 0 the sign of (9) is negative, i.e., (9) is equal to —s, we know that there 
exist one positive and one negative solution. The positive one is 8 as given in (5). 

Now let u e S, i.e., u = v + iw with v, w e M, |v| < 8. Observe that 
R e(w/ M (A, s )) = v/ v (A, s) — ^ anc [ e A(t > q f or s G [Q, ^ 

and A < 0. Hence, Re(w/ W (A, s)) < v/ v (A, s). This implies that 


/ 


e R e(uf u (A,s))x v ^ dx) < 


/ 


X>1 


e v f v (A,s)x 


v(dx) < oo 
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due to | v/ v (A, s)\ < s for |v| < 8 and condition (4). Hence for u e S the func- 
tion 0L(uf u (A, s)) = youf u (A,s) + J R+ ^f u (A,s)x _ ^ [ s we ii defined. 

uf u ( A , s) is a polynomial of u and thus it is an analytic function in C, for all s e [0, t] 
and A < 0. The function 0l is analytic in the set Sl = {z e C : | Re(z)| < s}. 

Thus, the function 0 l(w/ m (A, s )) is analytic in S , for all s e [0, t] and A < 0. By 
the holomorphicity theorem for parameter dependent integrals (see, e.g., [15]), we 
can conclude that Jq 0i(uf u (A, s))ds is analytic in S , for all A < 0. 

Defining <p(u. A) \= Jq 0l (uf u (A, ^))d.s we now apply [17] to prove that 0 ( u ) = 
J^ Jq ^(^/w(^^))d 1 s'7r(dA) = J M <p(w, A) 7r(d A) is analytic in S. Its conditions 
Ai and A2 are obviously satisfied. It remains to prove that condition A3 holds, i.e., 
that f R \(p(u, A) 1 7r(dA) is locally bounded. First, observe that 


\6 L (uf u (A,s))\ < \y 0 uf u (A,s)\ + j 

X<1 

0 uf u (A,s)x _ y 


+ 


/ 

X>1 


y ufu(A,s)x 


v(dv). 


- 1 


v(dv) 


( 10 ) 


Using (8), we can bound the first summand in (10) by: 


\youf u (A,s)\ < |yo I (r (|j8||«| + + |p||M J =: B x (u). 

For the second summand, using Taylor’s theorem we have that | e uf u (A,s)x _ < 

\uf u (A, s)\\x\ + 0(\uf u (A, s)\ 2 \x\ 2 ).Since\uf u (A, s)\ < t (|/J||k| + ^) + |p||«|, 
for the remainder term of Taylor’s formula we have 


0(\uf u (A,s)\ 2 \x\ 2 ) 




where the latter term converges to zero as v — > 0. If we define 




K(u) := t ( \p\\u\ + ) + \p\\u 


we obtain that 


J | e uf u (A,s)x _ 3 1 y(djc) < J XV ^ X ) + j O (V(w) 2 W 2 ) v(djc) =: B 2 (u), 


x<\ 


X<1 


X<1 


which is finite due to the properties of the measure v. 
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LetS^ := {C 3 u = v + iw : |v| < 8 — l/n} c S. Since the function v/ v ( A, s) 

is continuous on the compact set V n = {vgM: |v| < 8 — l/n}, it attains its min- 
imum and maximum on that set, i.e., there exists v* g V n such that v/ v (A, s ) < 
v*f v *(A,s ) < |v*/ y *(A, s)| =: K n (u) for all v g V w . Note that v* e V w implies that 
K n (u) < s. Since R e(uf u (A,s)) < vf v (A,s) and \ e u fu(A,s)x\ — gRe(w/ M (A,s))* < 

e K n {u)x , follows that 


/ 

JC>1 


yUfu(A,S)X 


v(dx) < J e Kn ^ x v(dx) + J 

X>1 X>1 


v(dx) =: 


which is finite due to (4) and the properties of the measure v. 

Since B\(u), B2W, and B^^ n {u) do not depend neither on s nor on A, we have 
\<p(u. A) | < £(Z?i(w) + B2 (u) + B^^nW) and 


*(2 ?i(m) + -62 (m) + 5 3 ^(w))7r(dA) = £( 5 i(w) + #20) + B^ n (u)) < 00, 


so the function f(2?i(w) + Z?2(w) + #3 ,«(m)) is integrable with respect to tv. Since 
<p(w, A) is analytic and thus a continuous function on S n , for all A < 0, it also holds 
that \cp(u, A) | is continuous on S n , for all A < 0. By the dominated convergence 
theorem, it follows that \cp(u, A) \jt (d A) is continuous and thus a locally bounded 
function on S n . Since n e N was arbitrary, it follows that the function is continuous 
and locally bounded on S , which completes the proof. 

Now, we can easily give conditions ensuring that (ii) in Theorem 4.1 is satisfied. 

Corollary 4.5 Let f x>1 e rx v(dx) < 00 for all r e M such that r < s for some 
£ > 0. Then the moment generating function <&x T \% is analytic on the open strip 

S := {u e C : |Re(w)| < 8} with 8 := -\p\-^ + jA where A := (|£| + f) 2 + 
Y- Furthermore , 


(p x T \%( u )= (H) 

0 

i(Xq + jiT) + (up + J J -^(e A ( T ~ s ^ - e~ As ^j A(dA,ds) + 0(u) 


exp 


for all u G S. 

Proof Follows from Theorems 4.2 and 4.4 noting that an analytic function is uniquely 
identified by its values on a line and [19, Lemma A.l]. 

Very similar to [19, Theorem 6.11], we can now prove that also condition (iii) in 
Theorem 4.1 is satisfied for the supOU SV model. 

Theorem 4.6 Ifu G C, u = v + iw and u g S as defined in Theorem 4.4, then the 
map 
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w i-> &x T \% ( v + i w; ) 


is absolutely integrable. 


5 Examples 

5.1 Concrete Specifications 

If we want to price a derivative by Fourier inversion, then this means in the supOU S V 
model that we have to calculate the inverse Fourier transform by numerical integration 
and inside this the double integral in @(u) = Jq 6i(uf u (A, k s , ))d 1 s , 7r(dA). If we 
want to calibrate our model to market data, the optimizer will repeat this procedure 
very often and so it is important to consider specifications where at least some of the 
integrals can be calculated analytically. 

Actually, it is not hard to see that one can use the standard specifications for v of 
the OU-type stochastic volatility model (see [3, 11, 20, 25]) which are named after 
the resulting stationary distribution of the OU-type processes. 

As in the case of a U- OU process we can choose the underlying Levy process to be 
a compound Poisson process with the characteristic triplet (yo, 0, abe~ bx l{ x> o}dx) 
with a, b > 0 where abusing notation we specified the Levy measure by its density. 
Furthermore, we assume that A follows a “negative” r -distribution, i.e., that n is 
the distribution of BR, where B e M_ and R ~ r(a,l) with a > 1 which is 
the specification typically used to obtain long memory /a polynomial decay of the 
autocorrelation function. We refer to this specification as the r -supOU SV model. 

Using (6) we have 


t t 



For the first summand in ®(u) we see 



M_ 0 


M_ 0 


h 



M_ 0 


M_ 0 


h 


h 
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For the three parts, we can now show: 


h 


(* + t) 


(1 - Bt) 


2— a 


h = - 

t 

h = 


2 ) B 2 {a - l)(a — 2) 

(uP + 


if a ^ 2, 


B 2 

+ t) 


ln(l — /?£) if a = 2, 


B(a - 1) ’ 


/3 = 




d,s7r(dA) = put. 


Furthermore setting C(A ) := ^ ^ — pu one obtains for the second sum- 

mand in 0 


R_ 0 R+ 

=“/ 


b — pu 


R_ 


A(fc + C(A)) 




)+C(A) y 


-AC(A)t 7T(dA). 


Unfortunately, we have been unable to obtain a more explicit formula for this integral, 
and so it has to be calculated numerically. In our example later on we have used the 
standard Matlab command “integral” for this. Note that the well-behavedness of 
this numerical integration depends on the choice of n. For our choice, Jt being a 
negative Gamma distribution implies roughly (i.e., up to a power) an exponentially 
fast decaying integrand for A — >► oc, whereas the behavior at zero appears to be hard 
to determine. 

We can also choose the underlying Levy process as in an IG-OU model with 
parameters 8 and y, while keeping the choice of the measure jt the same. In this 
case, we have v(dx) = (v -1 + y 2 ) x~2 exp (— \y 2 x) l{ x >o}dv and the only 

difference compared to the previous case is in the calculation of the triple integral 
which also can be partially calculated analytically so that only a one-dimensional 
numerical integration is necessary. 


5.2 Calibration and an Illustrative Example 

In this chapter, we calibrate the U— supOU SV model to market prices of European 
plain vanilla call options written on the DAX. 

Let t\, t 2 , . . . , tM be the set of different times to maturity (in increasing order) for 
which we have market option prices. The parameters to be determined by calibration 
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Table 1 Calibrated supOU SV model parameters for DAX data of August 19, 2013 



P 

a 

b 

B 

a 

yo 



-10.8797 

0.2225 

29.4025 

-0.0004 

4.3632 

0.0000 


Zfi 

Z?2 

Z*3 

Z?4 

^5 

z* 6 

Z*7 

Z?8 

0.0012 

0.0026 

0.0038 

0.0054 

0.0093 

0.0136 

0.0225 

0.0328 


are (p, a, b, B, a, yo, Zt x , • • • , Zt M ), where p describes the leverage, a and b are para- 
meters of the measure v, B, and a are parameters of the measure tv and yo is the drift 

parameter. Finally, z tl , z tM are z l[ = f R _ \ (e A{ti ~ s) - e~ As ) A(dA, civ), 

i = 

We calibrate by minimizing the root mean squared error between the Black- 
Scholes implied volatilities corresponding to market and model prices, i.e., RMSE = 

yJufLi YJjLi (blsimpv (cff) - blsimpv (C/,-)) /Ya=\ V- where M is the num- 
ber of different times to maturity, N[ is the number of options for each maturity, 
| CfJ J is the set of market prices and {C; ; } is the set of model prices, i = 1 , . . . , Nm , 
j = 1, . . . , M. Of course, minimizing the difference between Black-Scholes im- 
plied volatilities is just one possible choice for the objective function. We note that 
this data example is only supposed to be an illustrative proof of concept and that us- 
ing other objective functions including in particular weights for the different options 
should improve the results. 

We use closing prices of 200 DAX options on August 19, 2013. The level of DAX 
on that day was 8366.29. The data source was Bloomberg Finance L.P. and all the 
options were listed on EUREX. 

For the instantaneous risk-free interest rate, we used the 3-month LIBOR rate, 
which was 0.15173 %. The maturities of the options were 31, 59, 87, 122, 213, 304, 
486, and 668 days. The calibration procedure was performed in MATLAB. To avoid 
being stuck in local minima the calibration was run several times with different initial 
values and the overall minimum RMSE was taken. 

The implied parameters from the calibration procedure are given in Table 1 . The fit 
is good: The RMSE is 0.0046. We plot market against model Black-Scholes implied 
volatilities in Fig. 1. Although the RMSE is very low and in plots of market against 
fitted model prices (not shown here) one sees basically no differences, Fig. 1 shows 
that our model fits the implied volatilities for medium and long maturities very well, 
but the quality of the fit for shorter maturities is lower. 

The vector of the parameters (z^};=i,...,m is indeed increasing with maturity 
(cf. Lemma 4.3), although we actually refrained from including this restriction into 
our optimization problem. The autocorrelation function of the T-supOU model ex- 
hibits long memory for a e (1,2) (cf. [26, Sect. 2.2]). Since the calibration returns 
a = 4.3632, our market data are in line with a rather slow polynomial decay of the 
autocorrelation function, which is in contrast to the exponential decay of the auto- 
correlation function in the OU-type SV model, but the calibrated model does not 


Implied volatility Implied volatility Implied volatility Implied volatility 


R. Stelzer and J. Zavisin 


Maturity in 31 days Maturity in 59 days 




Maturity in 87 days 


Maturity in 122 days 






Maturity in 486 days 


Maturity in 668 days 



Fig. 1 Calibration of the supOU model to call options on DAX: The Black-Scholes implied 
volatilities. The implied volatilities from market prices are depicted by a dot , the implied volatilities 
from model prices by a solid line 
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exhibit long memory. One should be very careful not to overinterpret these findings, 
as no confidence intervals/hypothesis tests are available in connection with such a 
standard calibration. 

The leverage parameter p is negative, which implies a negative correlation be- 
tween jumps in the volatility and returns. Hence, the typical leverage effect is present. 
The drift parameter of the underlying Levy basis yo is estimated to be practically 
zero. So our calibration suggests that a driftless pure jump Levy basis may be quite 
adequate to use. 

Let us briefly turn to a comparison with the OU-type stochastic volatility model 
(cf. [19] or [20]) noting that a detailed comparison with various other models is 
certainly called for, but beyond the scope of the present paper. For some /3 < 0 
looking at a sequence of U- supOU models with a n = n, B n = fi/n and all others 
parameters fixed, shows that the mean reversion probability measures 7t n converge 
weakly to the delta distribution at /3. So the OU model is in some sense a limiting 
case of the supOU model. However, the limiting model is very different from all 
approximating models, as it is Markovian, has the same decay rate for all jumps, 
whereas the approximating supOU models have all negative real numbers as possible 
decay rates for individual jumps. This implies that in connection with real data the 
behavior of the OU and the supOU model can well be rather different. Calibrating a 
r ~ OU model to our DAX data set (so the only parameter now different is jx , which is 
a Dirac measure) returns actually a globally better fit (the RMSE is 0.0037). Looking 
at the plots of market against model implied volatilities they all look quite similar 
(Fig. 2 shows only the last four largest maturities) to the ones in Fig. 1, although the 
fit for the early maturities is definitely better when looking closely. Yet, there is one 
big exception, the last maturity, where the supOU model fits much better. Whereas 
the rate of the underlying compound Poisson process is a = 0.2225 in the supOU 
model, it is 1.2671 in the OU model. The mean of the decay rates is —0.0017 in the 
supOU model and the decay rate of the OU case is — 1 .3906. Noting that the standard 
deviation of the decay rates is 0.0008 in the supOU model, the two calibrated models 
are indeed in many respects rather different. 

Remark 5. 1 (How to price options with general maturities?) After having calibrated 
a model to observed liquid market prices one often wants to use it to price other 
(exotic) derivatives. Looking at a European derivative with payoff f(Sj) for some 
measurable function / and maturity T > 0, one soon realizes that we can only 
obtain its price directly if T e [t\ ,t2, . . . Am}, as only then we know zt, thus the 
characteristic function @ Xt \% and therefore the distribution of the price process 
at time T conditional on our current information %. This is not desirable and the 
problem is that we assume that we know % in theory, but we have only limited 
information in the market prices which we can use to get only parts of the information 
in %. 

It seems that to get zt for all t e M+ one needs to really know the whole past 
of A, i.e., all jumps before time 0 and the associated times and decay rates. This is 
clearly not feasible. A detailed analysis on the dependence of Zt on t is beyond the 
scope of this paper. But we briefly want to comment on possible ad hoc solutions 
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Maturity in 304 days 




Fig. 2 Calibration of the OU model to call options on DAX: The implied volatilities from market 
prices are depicted by a dot , the implied volatilities from model prices by a solid line. Last four 
maturities only 


to “estimate” zt based on The first one is to either interpolate or fit 

a parametric curve t \-^ it to the “observed” If one also ensures the 

decreasingness in tin this procedure, one should get a reasonable approximation, 
especially when the grid is fine and one considers maturities in [t\, tM ]• 

From the probabilistic point of view, one wants to compute E(zt I {Zt t }*=1,...,m) for 
T £ {t\, t 2 , . . . , t M }- Whether and how this conditional expectation can be calculated, 
is again a question for future investigations. But what one can calculate easily is 
the best (in the L 2 sense) linear predictor of zt given f • One simply 
needs to straightforwardly adapt standard time series techniques (like the innovations 
algorithm or linear L 2 filtering, see, e.g., [8]) noting that one has 

co v(zt,z u )= [ [ — j—(e At — l)(e Au — l)dsTt(dA) f x 2 v(dx) W, u e M+. 

JR- JR- A JR+ 


Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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A Two-Sided BNS Model for Multicurrency 
FX Markets 

Karl Friedrich Bannor, Matthias Scherer and Thorsten Schulz 


Abstract We present a multivariate jump-diffusion model incorporating stochastic 
volatility and two-sided jumps for multicurrency FX markets, which is an extension 
of the univariate T-OU-BNS model introduced by [2]. The model can be considered 
a multivariate variant of the two-sided T-OU-BNS model (cf. [1]). We discuss FX 
option pricing and provide a calibration exercise, modeling two FX rates with a 
common currency by a bivariate model and calibrating the dependence parameters 
to the implied FX volatility surface. 

Keywords Barndorff-Nielsen-Shephard model • Stochastic volatility • Multivari- 
ate model • Jump-diffusion model • Multicurrency FX markets 


1 Introduction 


For derivatives valuation, the Black-Scholes model, presented in the seminal paper 
[4], generated a wave of stochastic models for the description of stock-prices. Since 
the assumptions of the Black-Scholes model (normally distributed log-returns, inde- 
pendent returns) cannot be observed in neither time series of stock-prices nor option 
markets (implicitly expressed in terms of the volatility surface), several alterna- 
tive models have been developed trying to overcome these assumptions. Some 
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models, as, e.g., [9, 23] account for stochastic volatility, while others as, e.g., [12, 
16] enrich the original Black-Scholes model with jumps. Both approaches have 
been combined in the models of, e.g., [3, 6]. Another approach combining stochastic 
volatility and negative jumps in both volatility and asset-price process, employ- 
ing Levy subordinator-driven Ornstein-Uhlenbeck processes, is available with the 
Barndorff-Nielsen-Shephard (BNS) model class, presented in [2] and extended in 
several papers (e.g. [18]). A multivariate extension of the BNS model class employ- 
ing matrix subordinators is designed in [20] and pricing in this model is scrutinized in 
[17]. In the special case of a L-OU-BNS model, a tractable variant of a multivariate 
BNS model based on subordination of compound Poisson processes was developed 
by [15]. This model allows for a separate calibration of the single assets (following 
a univariate r~ OU-BNS model) and the dependence structure. 

Besides for options on stocks, these models have also been used to price derivatives 
on other underlyings. When modeling foreign exchange (FX) rates instead of stock- 
prices, one has to cope with the introduction of two different interest rates as well as 
identifying the actual tradeable assets. The Black-Scholes model was adapted to FX 
markets by [8]. Many of the models mentioned above have been employed for FX 
rates modeling as, e.g., [3, 9]. Since the original BNS model assumes only downward 
jumps in the asset-price process, [1] extend the BNS model class to additionally 
incorporate positive jumps, which is needed for the realistic modeling of FX rates 
and calibrates much better to FX option surfaces. 

In this paper, we unify the extensions of the BNS model from [1, 15] and intro- 
duce a multivariate F-OU-BNS model with time-changed compound Poisson drivers 
incorporating dependent jumps in both directions, both generalizing the univariate 
two-sided r-OU-BNS model and the multivariate “classical” T-OU-BNS model. 
Since the two-sided T-OU-BNS model seems to be particularly suitable for the mod- 
eling of FX rates, we consider a multivariate two-sided T-OU-BNS model a sensible 
choice for the valuation of multivariate FX derivatives such as best-of-two options. 
Since the multivariate two-sided model accounts for joint and single jumps in the FX 
rates, the jump behavior of modeled FX rates resembles reality better than models 
only employing joint or single jumps, as illustrated in Fig. 1. Furthermore, a mul- 
tivariate two-sided BNS model for FX rates with a common currency also implies 
a jump-diffusion model for an FX rate via quotient or product processes. A crucial 
feature of our multivariate approach is the separability of the univariate models from 
the dependence structure, i.e. one has two sets of parameters that can be determined 
in consecutive steps: parameters determining each univariate model and parameters 
determining the dependence. This feature provides tractability for practical applica- 
tions like simulation or calibration on the one side, but also simplifies interpretability 
of the model parameters on the other side. 

Instead of modeling the FX spot rates only, one could model FX forward rates 
to get a model setup suited for pricing cross-currency derivatives depending on FX 
forward rates, as for example cross-currency swaps. Multicurrency models built 
upon FX forward rates (see e.g. [7]) on the one hand support flexibility to price 
such derivatives, on the other hand, however, these models do not provide the crucial 
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Fig. 1 The logarithmic returns of EUR-SEK and USD-SEK FX rates over time. Assuming that 
every logarithmic return exceeding three standard deviations ( dashed lines ) from the mean can be 
interpreted as a jump (obviously, smaller jumps occur as well, but may be indistinguishable from 
movement originating in the Brownian noise), one can see that joint as well as separate jumps in the 
EUR-SEK and the USD-SEK logarithmic returns occur. Clearly, this 3-standard deviation criterion 
is just a rule of thumb, however, [10] investigated the necessity of both common and individual 
jumps in a statistical thoroughly manner. Hence, a multivariate FX model capturing the stylized 
facts of both joint and separate jumps can be valuable. The data was provided by Thomson Reuters 


property of separating the dependence structure from the univariate models, which 
makes it extremely difficult to calibrate such a multivariate model in a sound manner. 

The remaining paper is organized as follows: In Sect. 2, we recall the two-sided 
Barndorff-Nielsen-Shephard model constructed in [1] and outline stylized facts of 
its trajectories. In Sect. 3, we introduce a multivariate version of the two-sided r~ OU- 
BNS model, using the time change construction from [15] to incorporate dependence 
between the jump drivers. Section 4 focuses on the specific obstacles occuring when 
modeling FX rates in a multivariate two-sided r~ OU-BNS model, particularly the 
dependence structure of joint jumps and the implied model for a third FX rate which 
may be induced. In Sect. 5, we describe a calibration of the model to implied volatility 
surfaces and show how the model can be used to price multivariate derivatives. We 
then evaluate the model in a numerical case study. Finally, Sect. 6 concludes. 


2 The Two-Sided Barndorff-Nielsen-Shephard Model Class 


We briefly motivate the construction and main features of the two-sided BNS model 
class. The classical BNS model accounts for the leverage effect , a feature of stock 
returns, by incorporating negative jumps in the asset-price process, accompanied by 
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upward jumps in the stochastic variance. While downward jumps might be sufficient 
in the case of modeling stock-price dynamics, it is not suitable when modeling FX 
rates, where one-sided jumps contradict economic intuition. Hence, [1] develop an 
extension of the BNS model which allows for two-sided jumps and is able to capture 
the symmetric nature of FX rates. 

We say that a stochastic process {S t }t> o follows a two-sided BNS model (abbrevi- 
ated BNS2 model), if the log-price X t := log S t follows the dynamics of the SDEs 

dX t = (/x + /3<r 2 ) d t + 07 d W t + p+ dZ^ + p_ dZ t , 
dcr 2 = — dt -f dZ r + dZ^ , 


with independent Levy subordinators Z + = {Z +} t > 0 and Z~ = {Z ^} t > 0 and W = 
{W t } t > 0 being a Brownian motion independent of Z + and Z - , /1 e M, X > 0, 
p + > 0, p_ < 0. 1 If the Levy drivers Z + , Z~ are independent copies of each other, 
we call the model a reduced two-sided BNS model. If, additionally, p + = — p_ we 
have a symmetric situation, upward jumps occurring similarly likely as downward 
jumps. Furthermore, the average absolute jump sizes in the log-prices coincide. Thus, 
we call the model a symmetric BNS model or SBNS model. In a calibration exercise of 
[1], the SBNS model produced decent calibration results, while limiting the number 
of parameters to five. 

In contrast to the classical BNS model, the BNS2 model has two independent Levy 
subordinators Z + , Z~ incorporating jumps in the asset-price process in opposite 
directions, but both accounting for upward jumps in the variance process cr 2 = 
(a 2 } f >o. Thus, shocks in the asset-price are always accompanied by upward jumping 
variance, regardless of the jump direction. Furthermore, the variance process is still 
a Levy subordinator driven Ornstein-Uhlenbeck process. As discussed in [1], the 
symmetric nature of the two-sided BNS model makes it particularly suitable for FX 
rates modeling and calibrates well to option surfaces on FX rates. 

An important example is the special case where the Levy drivers Z + , Z~ are 
compound Poisson processes with exponential jump heights. In this case we call the 
model a two-sided T-OU-BNS model. The log-price of a two-sided T-OU-BNS 
model has a closed-form characteristic function (cf. [1]), hence allows for rapid 
calibration to vanilla prices by means of Fourier-pricing methods as introduced in 
[5, 21]. A typical trajectory of the two-sided r~ OU-BNS model can be found in 
Fig. 2. It can clearly be seen that shocks in the FX rate process, e.g. caused by 
macroeconomic turbulences or unanticipated interest rate movements, cause a sudden 
rise in volatility. As time goes by without the arrival of new shocks, volatility is 
calming down again. 


1 Compared to the original formulation of the model in [1] and the original BNS model from [18], 

we do not change the clock of the subordinators to t i-> Xt. This formulation is equivalent and more 
handy in the upcoming multivariate construction. 


A Two-Sided BNS Model for Multicurrency FX Markets 


97 




Time 


Fig. 2 Sample path of a two-sided BNS model, generated from calibrated parameters. The FX rate 
process exhibits positive and negative jumps 


3 A Tractable Multivariate Extension of the Two-Sided 
r-OU-BNS Model 

We now present a multivariate two-sided U-OU-BNS model, where the univariate 
processes still follow the dynamics of a two-sided r-OU-BNS model. Here, all 
univariate FX rate processes live on the same probability space and the probabil- 
ity measure is assumed to be a pricing measure. Besides establishing dependence 
between the driving Brownian motions, we want to incorporate dependence to the 
Levy drivers, thus establishing dependence among the price jumps as well as among 
the variance processes. Jumps in FX rates are mainly driven by unanticipated macro- 
economic events (e.g. interest-rate decisions of some central bank) in one of the 
monetary areas. If we consider a multivariate model with one common currency, e.g. 
modeling the EUR-USD and the EUR-CHF exchange rates, it is likely that jumps 
caused by macroeconomic events in the common currency monetary area have an 
impact on all exchange rates, e.g. the debt crisis of Eurozone countries should affect 
both the EUR-USD as well as the EUR-CHF exchange rate. Hence, dependence of 
the jump processes seems to be a desirable feature of a multivariate model for FX 
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rates with common currency. To establish dependence between the compound Pois- 
son drivers, we employ the time-change methodology presented in [15], yielding an 
analytically tractable and easy-to- simulate setup. 

Definition 1 ( Time-changed CPPs with exponential jump sizes) Let co, rj \, . . . , 
rid > 0 and c\, . . . , cj e (0, co). Furthermore, let d e N and . . . , Y ^ 
be d independent compound Poisson processes with intensities ci/(co — ci), . . . , 
Cd/(co - Cd) and Exp(c 0 ^i/(c 0 - ci)), . . . , Exp(c 0 ^/(c 0 - c d )) -distributed jump 
sizes. To these compound Poisson processes, we apply a time change with another 
independent compound Poisson process T = {7)} ? >o with Exp(l) -distributed jump 
sizes and intensity co- Define the T - subordinated compound Poisson processes 
Z (1) , . . . , Z (d) by {Z f °\> o := {Yj) ],>()■ We call the d-tuple of (Z (1) , . . . , Z (d) ) 
a time-change-dependent multivariate compound Poisson process with parameters 
(c 0 ,ci, ...,c d , ip, ...,Y] d ). 

At first sight, the subordination of a compound Poisson process with another 
compound Poisson process may look strange, particular in the light of interpreting 
the time change as “business time”, following the idea of [14]. But in this case, we 
primarily use the joint subordination to introduce dependence viajoint jumps between 
compound Poisson processes without the interpretation as “business time”, the time 
change construction has a technical nature and provides a convenient simulation 
scheme. 

Remark 1 ( Properties of time-changed CPPs, cf. [15]) 

(i) Each coordinate of the T -subordinated compound Poisson process Z ^ is again a 
compound Poisson process with intensities cj and jump size distribution Exp(^ 7 ) 
for all 7 = 1,..., d. 

(ii) Forcmax := maxi< 7 <^ {c 7 }, the correlation coefficient of (Z^\ Z ^), 1 < j < 
d,l<k<d,jy^kis given by 

co4z«>,z«>l = ^=,^55 

L J Co ^max 

with k := c m axM) G (0, 1). We call k the time-change correlation parameter. In 
particular, correlation coefficients ranging from zero to ^Jcj c^ /c max are possible, 
and the correlation does not depend on the point in time t. 

(iii) Due to the common time change, the compound Poisson processes Z^\ . . . , 

are stochastically dependent. Moreover, it can be shown that the dependence 
structure of the d-dimensional process (Z^\ . . . , Z^) is driven solely by the 
time-change correlation parameter k . 

A striking advantage of introducing dependence among the jumps in this manner 
is that the time-changed processes . . . , Z^ remain in the class of compound 
Poisson processes with exponential jump heights, which ensures that the marginal 
processes maintain a tractable structure. In particular, the characteristic functions 
of the univariate log-price processes in a two-sided L-OU-BNS model are still at 
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hand. Moreover, the univariate processes . . . , can be simulated as ordinary 
compound Poisson processes with exponentially distributed jump heights and the 
Laplace transform is given. Hence, we can now define a multidimensional two-sided 
r-OU-BNS model with dependent jumps. 

Definition 2 ( Multivariate two-sided r-OU-BNS model) A d-dimensional sto- 
chastic process {S)}^>o with S t = (S^ l \ . . . , S^) follows a multivariate two-sided 
r-OU-BNS model with time -change -dependent volatility drivers , if the dynamics of 
the log-price vector X t = (X^\ . . . , X*f*) = (log S ( /\ . . . , log S^) are governed 
by the following SDEs: 

dX^ j) = (/*, + Pj (a r 0) ) 2 ^ dr + a t U) d W, U) + p ( + dZ r +0) + p ( J ] dZ“°'\ 

d (cr, U) ) 2 = ( a t J) ) 2 dt + dZ f +0) + dZ~ U) , 

with . . . , W^) being correlated Brownian motions with correlation matrix 

X and for all 1 < j < d, pi j , fij e M, > 0, p^_P < 0, kj > 0, and 
(Z +(1) , Z _(1) ), . . . , (Z +id \ Z~( d) ) are pairs of independent compound Poisson 
processes with exponential jumps. Furthermore, the 2d-dimensional Levy process 
(Z + ^\ Z~( l \ . . . , Z +(J) , Z~^) splits up in two time-change-dependent d-tuples 
of compound Poisson processes (cf. Definition 1). 

At first glance, Definition 2 looks cumbersome, but it is necessary to capture all 
combinations of possible dependence. As a simplifying example, one might think 
about introducing dependence between (Z + ^\ . . . , Z + ^) on the one hand and 
between (Z~^\ . . . , Z~^) on the other hand. In this case, positive jumps of the 
processes are mutually dependent and negative jumps are mutually dependent, but 
positive jumps occur independently of negative jumps. A closer examination how to 
establish the dependence structure between the time-change-dependent compound 
Poisson processes is made in the following section, since dependence between the 
jumps has to be introduced in a sound economic manner. 

This construction can further be generalized by employing Levy processes, cou- 
pled by Levy copulas (cf. [11]). For the present investigation, however, we prefer the 
time-change construction presented in Definition 1, since this construction provides 
an immediate stochastic representation of the dependence structure. Thus, a straight- 
forward simulation scheme is provided and at least some analytical tractability when 
doing computational exercises is ensured, which may be more complicated when 
employing general Levy copulas. 

Remark 2 ( Calibration of the univariate processes) An immediate corollary from 
the compound Poisson structure of the univariate jump processes {Z + ^\ Z - ^), 
j = 1, . . . , d, is that the univariate log-price processes {X^} ? >o, j = 1, . . . , d, 
still follow a univariate two-sided U- OU-BNS model and the parameters of the 
univariate processes may be calibrated separately to univariate derivative prices. 
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The dependence parameters, which are the correlation matrix U of the Brownian 
motions and the time-change correlation parameters k and k that determine the 
dependence structure of the time-change-dependent multivariate compound Poisson 
processes, can be calibrated separately afterwards without altering the already fixed 
marginal distributions. This simplifies the model calibration and is a convenient fea- 
ture for practical purposes, because it automatically ensures that univariate derivative 
prices are fitted to the multivariate model. 


4 Modeling Two FX Rates with a Bivariate Two-Sided 
r-OU-BNS Model 


In this section, we discuss the modeling of FX rates with a bivariate two-sided r-OU- 
BNS model. Particularly, we discuss how to soundly introduce dependence between 
the Levy drivers and investigate a possible “built-in” model induced by the model for 
the two FX rates. We concentrate on the case of two currency pairs, which illustrates 
the problems of choosing the jump dependence structure best. 

To ensure familiarity with the FX markets wording, we recall that an FX rate 
is the exchange rate between two currencies, expressed as a fraction. The currency 
in the numerator of the fraction is called (by definition) domestic currency , while 
the currency in the denominator of the fraction is called foreign currency ? The role 
each currency plays in an FX rate is defined by market conventions and is often due 
to historic reasons, so economic interpretations are not necessarily helpful. A more 
detailed discussion of market conventions of FX rates and derivatives is provided in 
[22], a standard textbook on FX rates modeling is [13]. 


4.1 The Dependence Structure of the Levy Drivers 

Analogously to the multivariate classical U-OU-BNS model described in the previ- 
ous section, we use the time-change construction to introduce dependence between 
the compound Poisson drivers in the bivariate two-sided U-OU-BNS model. Since 
we want to model dependence between the jumps in different FX rates, we have to 
choose the coupling of the compound Poisson drivers carefully and in a way to cap- 
ture economic intuition: When modeling two FX rates, we may want to establish an 
adequate kind of dependence between the different drivers, accounting separately for 
positive and negative jumps in the respective FX rate. Depending on which currency 
is foreign or domestic in the two currency pairs of the FX rates, dependence may be 


2 The wording “foreign” and “domestic” currency does not necessarily reflect whether the currency 
is foreign or domestic from the point of view of a market participant. The currency EUR, e.g., is 
always foreign currency by market convention. Sometimes, the foreign currency is called underlying 
currency , while the domestic currency is called accounting or base currency. 
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introduced in a different manner to result in sound economic situations. Hence, we 
can distinguish between the following combinations that may occur for two different 
FX rates: 

1. There are no common currencies, e.g. in the case of EUR-CHF and USD-JPY. 

2. In both FX rates the common currency is the foreign (resp. domestic) currency, 
e.g. EUR-USD and EUR-CHF (EUR-CHF and USD-CHF, respectively). 

3. The common currency is the domestic currency in one FX rate and the foreign 
currency in the other FX rate, e.g. EUR-USD and USD-CHF. 

For the sake of simplicity, we restrict ourselves to the second case, which occurs 
in a detailed numerical study in the following section. The other cases can be treated 
analoguously. 

In case of a common foreign currency, a sudden macroeconomic event strength- 
ening (resp. weakening) the common currency should result in an upward (resp. 
downward) jump of both FX rates. Hence, it may be a sensible choice to couple the 
drivers for the positive jumps and to separately couple the drivers for the negative 
jumps respectively, to ensure the occurrence of joint upward and downward jumps. 


4.2 Implicitly Defined Models 


When two FX rates are modeled and among the two rates there is a common cur- 
rency, this bivariate model always implicitly defines a model for the missing currency 
pair which is not modeled directly, e.g. when modeling EUR-USD and EUR-CHF 
exchange rates simultaneously, the quotient process automatically implies a model 
for the USD-CHF exchange rate. Similar to the bivariate Garman-Kohlhagen model, 
modeling two FX rates directly by a bivariate two-sided BNS model does not nec- 
essarily imply a similar model for the quotient or product process from the same 
family, but the main structure of a jump-diffusion-type model is maintained. 

Lemma 1 (Quotient and product process of a two-sided BNS model) Given two 
asset-price processes {Sp^>o and {Sp^}f>o modeled by a multivariate two-sided 
r-OU-BNS models , the product and quotient processes resp. 

{SPV*Sp^}f>0 are both of jump -diffusion type. 

Proof Follows directly from log(sP^Sp^) = and \og(S^ / S^) = 

X (D _ X (2) b 

Due to symmetry in FX rates, the implied model for the third missing FX rate can 
be used to calibrate the parameters steering the dependence, namely, the correlation 
between the Brownian motions as well as the time-change correlation parameters, or 
equivalently the intensities of the time-change processes. Additionally, the calibration 
performance of the implied model to plain vanilla options yields a plausibility check 
whether the bivariate model may be useful for the evaluation of true bivariate options, 
e.g. best-of-two options or spread options. 
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5 Application: Calibration to FX Rates and Pricing of 
Bivariate FX Derivatives 

In this section, we describe the calibration process of a bivariate two-sided BNS 
model to market prices of univariate FX derivatives, which allows us to completely 
specify the model. Furthermore, we describe how to price bivariate FX options like, 
e.g., best-of-two options in a bivariate two-sided BNS model. 


5.1 Data 

As input data for our calibration exercise we use option data on exchange rates 
concerning the three currencies EUR, USD, and SEK. Since the EUR-USD exchange 
rate can be regarded as an implied exchange rate, i.e. 

USD _ SEK/EUR 
EUR “ SEK/USD’ 

we model the two exchange rates EUR- SEK and USD- SEK directly with two-sided 
U-OU-BNS models as suggested in [1]. For each currency pair EUR-SEK, USD- 
SEK, and EUR-USD, we have the implied volatilities of 204 different plain vanilla 
options (different maturities, different moneyness) available as input data. The option 
data is as of August 13, 2012, and was provided by Thomson Reuters. 


5.2 Model Setup 


We consider a market with two traded assets, namely {exp(rusDO^ USDSEK }f>0 and 
{exp(rEURO^f URSEK }f>0 ? where 5) USDSEK , s EURSEK denote the exchange rates at 
time t and hjsd, ^eur, ^sek denote the risk free interest rates in the correspond- 
ing monetary areas. These assets can be seen as the future value of a unit of 
the respective foreign currency (in this case USD or EUR), valued in the domes- 
tic currency (which is SEK). Assume a risk-neutral measure Q SEK to be given 
with numeraire process {exp(r S EKOh>o*i-e. {exp((r US D - ^SEK)O^ USDSEK }f>0 and 
{exp((rnuR — ^sek)0^ eursek }^>o are martingales with respect to Q SEK , governed 
by the SDEs 
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I y* SEK / (cr* SEK ) C ^SEK^SEK 

dX =lr SEK -r, -+ — — 

\ "* SEK *°*SE 


• + 


^*SEK ^*SEK '?*SEK + P * SEK / 


da/ 


+ < SEK d^; SEK + p+ EK dZ+* SEK - p; SEK dZ r -* SEK , 
f SEK = - A*sE K a f 2 * SEK dr + dZ+* SEK + dZp SEK , 


for A* SEK, P+ SEK . P*sek > 0, * G {EUR, USD}, {flfURSEK < SDSEK } ; >0 being 

I pT IRSFK 

a two-dimensional Brownian motion with correlation r g [-1,1], and {Z^ , 

Z r + uSDSEK|^ o an( j i^-eursek^ ^-usdseKj^ q being (independent) two-dimen- 
sional time-change dependent compound Poisson processes with parameters 


and 


(max(c EURSEK , 

(max(c EURSEK , 


c usdsek )/ /c+ ’ c eursek ’ c usdsek ’ Ceursek > Cusdsek ) 

^ USDSEk )/^ ’ C EURSEK ’ C USDSEK ’ ^ EURSEK ’ ^ USDSEK ^’ 


where /c + and k~ are the time-change correlation parameters (following the frame- 
work in Sect. 4.1). Hence, the EUR-SEK, EUR-USD exchange rates follow a bivari- 
ate SBNS model. The implied exchange rate process ^eurusd i s given by 


J oEURUSDl 

H it 


] = 

gEURSEK 

J />0 

gUSDSEK 


t> o 


Due to the change-of-numeraire formula for exchange rates (cf. [19]), the process 
{exp((r£UR — njSD)0Sf URUSD }t>0 is a martingale with respect to Q USD , where 
Q usd is determined by the Radon-Nikodym derivative 


d Q usd sJJSDSEK exp(rusD0 
dQSEK ? - 5 USDSEK exp(rsEKf) - 


5.3 Calibration 


For calibration purposes, we use the volatility surfaces of the EUR-SEK and USD- 
SEK exchange rates to fit the univariate parameters. Due to the consistency relation- 
ships which have to hold between the exchange rates, we can calibrate the dependence 
parameters by fitting them to the volatility surface of EUR-USD. Even in presence of 
other “bivariate options” (e.g. best-of-two options), we argue that European options 
on the quotient exchange rate currently provide the most liquid and reliable data for 
a calibration. 

The calibration of the presented multivariate model is done in two steps. Due to the 
fact that the marginal distributions can be separated from the dependence structure 
within our models, it is possible to keep the parameters governing the dependence 
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Table 1 Calibrated parameters in the two univariate FX models 


★ 

e*SEK 

★ SEK 

a o 

C*SEK 

^★SEK 

7-* SEK 

P*SEK 

#options 

Error (%) 

EUR 

8.229 

0.074 

0.71 

62.13 

3.25 

1.66 

204 

1.08 

USD 

6.664 

0.078 

1.15 

40.81 

2.19 

1.22 

204 

3.17 


separated from the parameters governing the marginal distributions. Therefore, in a 
first step we independently calibrate both univariate models for the EUR-SEK and 
USD-SEK exchange rates, and in a second step we calibrate the parameters driving 
the dependence structure. In doing so, the fixed univariate parameters are not affected 
by the second step. Since there is little market data of multi-currency options, this two 
step method is very appealing: we can disintegrate one big calibration problem in two 
smaller ones. The univariate models are calibrated to volatility surfaces of the EUR- 
SEK and USD-SEK exchange rates via minimizing the relative distance of the model 
implied option prices to market prices, with equal weight on every option. Option 
prices in the univariate two-sided BNS models are obtained via Fourier inversion (cf. 
[5, 21]) by means of the characteristic function of the log-prices. 

Table 1 gives an overview of the calibration result of the univariate models. To 
reduce the number of parameters, we use symmetric two-sided U-OU-BNS models as 
described in [1] . Furthermore, we assume that the time-change correlation parameters 
k + and k~ coincide; maintaining the symmetric structure. The relative error in model 
prices with respect to market prices of the 204 options can be seen as calibration error. 
The average relative error in the EUR-SEK-model is about one percent, and in the 
USD-SEK-model it is around three percent. Hence, the univariate models fit the FX 
market reasonably well. Each univariate calibration requires about 20 s. 

The calibration of the parameters governing the dependence is done by means 
of the third implied exchange rate, namely by the volatility surface of EUR-USD. 
Model prices of EUR-USD-options with payout function / at time t can be obtained 
by a Monte-Carlo simulation of the following expected value: 


IEqusd 


nEURUSD 


) exp (— rusD?)] 



USDSEK 


cUSDSEK 

^0 


exp(— rsEKO 


( 1 ) 


Here, we used 100,000 simulations to calibrate the dependence parameters. The exe- 
cution of the overall optimization procedure takes around four hours. The calibration 
error of the dependence parameters in terms of average relative error is roughly nine 
percent, which is still a good result giving consideration to the fact that we try to 
fit 204 market prices by means of just two parameters in an implicitly specified 
model. A more complex model, obtained by relaxing the condition that /c + and k~ 
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Fig. 3 The best matching correlation between the two Brownian motions is 0.52 and the optimal 
time-change dependence parameter is k = 0.96. This corresponds to a calibration error of around 
nine percent for the 204 options on this currency pair 


coincide, leads to even smaller calibration errors. However, we keep the model as 
simple as possible to maintain tractability. Figure 3 illustrates the calibration error 
of this second step depending on different choices of the dependence parameters. 
Eventually, the whole model is fixed. 

Now, we are able to price European multi-currency options, for instance a best- 
of-two call option with a payoff at time t given by 


max 


max 


yUSDSEK 

e t 


cUSDSEK 

^0 


max 


vEURSEK 

e t 


oEURSEK 

^0 


i.e. we consider the maximum of two call options with strike K > 0 on two exchange 
rates. This option can be used as an insurance against a weakening SEK, because one 
gets a payoff if the relative performance of one exchange rate, USD-SEK or EUR- 
SEK, is greater than K — 1 . Pricing is done by a Monte-Carlo simulation that estimates 
the expected value in Eq. (1). We used 100,000 scenarios to price this option, which 
takes about four minutes. Figure 4 shows option prices of the best-of-two call option 
dependent on various choices of the dependence parameters. 
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Fig. 4 Prices of a best-of-two call option where K = 1.1 and 7 = 1. One observes that both 
dependence parameters play an important role for the price of this option. For the optimal parameter 
setting (Brownian motion correlation is 0.52, k = 0.96), the fair price of this option is 261 bp 


6 Conclusion and Outlook 


We introduced a multi-dimensional FX rate model generalizing the univariate two- 
sided BNS model in a way that each FX rate is still modeled as a two-sided BNS 
model. Thus, the parameters driving the dependence structure can be separated from 
the marginal distributions. This simplifies the calibration of the overall model tremen- 
dously, such that the multicurrency model can be calibrated to plain vanilla FX option 
prices. As an outlook for further research, we wonder whether there exists a measure 
change from the real world measure to the martingale measure we assumed to exist 
in the first place. 
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Modeling the Price of Natural Gas 
with Temperature and Oil Price 
as Exogenous Factors 


Jan Muller, Guido Hirsch and Alfred Muller 


Abstract The literature on stochastic models for the spot market of gas is domi- 
nated by purely stochastic approaches. In contrast to these models, Stoll and Wiebauer 
[14] propose a fundamental model with temperature as an exogenous factor. A model 
containing only deterministic, temperature-dependent and purely stochastic compo- 
nents, however, still seems not able to capture economic influences on the price. In 
order to improve the model of Stoll and Wiebauer [14], we include the oil price as 
another exogenous factor. There are at least two fundamental reasons why this should 
improve the model. First, the oil price can be considered as a proxy for the general 
state of the world economy. Furthermore, pricing formulas in oil price indexed gas 
import contracts in Central Europe are covered by the oil price component. It is shown 
that the new model can explain price movements of the last few years much better 
than previous models. The inclusion of oil price and temperature in the regression of 
a least squares Monte Carlo method leads to more realistic valuation results for gas 
storages and swing options. 

Keywords Gas spot price • Oil price model • Temperature • Gas storage valuation • 
Least squares Monte Carlo • Seasonal time series model 


1 Introduction 


During the last years trading in natural gas has become more important. The traded 
quantities over-the-counter and on energy exchanges have strongly increased and new 
products have been developed. For example, swing options increase the flexibility of 
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suppliers and they are used as an instrument for risk management purposes. Important 
facilities for the security of supply are gas storages. 

These are two examples of complex American- style real options that illustrate the 
need for reliable pricing methods. Both options rely on nontrivial trading strategies 
where exercise decisions are taken under uncertainty. Therefore, analytic pricing 
formulas cannot be expected. The identification of an optimal trading strategy under 
uncertainty is a typical problem of stochastic dynamic programming, but even then 
numerical solutions are difficult to obtain due to the curse of dimensionality. There- 
fore, simulation-based approximation algorithms have been successfully applied in 
this area. Longstaff and Schwartz [9] introduced the least square Monte Carlo method 
for the valuation of American options. Meinshausen and Hambly [10] extended the 
idea to Swing options, and Boogert and de Jong [5] applied it to the valuation of 
gas storages. Their least squares Monte Carlo algorithm requires a stochastic price 
model for daily spot prices generating adequate gas price scenarios. We prefer this 
approach to methods using scenario trees or finite differences as it is independent of 
the underlying price process. 

The financial literature on stochastic gas price models is dominated by purely sto- 
chastic approaches. The one- and two-factor models by Schwartz [12] and Schwartz 
and Smith [13] are general approaches applicable to many commodities, such as oil 
and gas. The various factors represent short- and long-term influences on the price. 
An important application of gas price models is the valuation of gas storage facili- 
ties. Within this context, Chen and Forsyth [7] and Boogert and de Jong [6] propose 
gas price models. Chen and Forsyth [7] analyze regime- switching approaches incor- 
porating mean-reverting processes and random walks. The class of factor models is 
extended by Boogert and de Jong [6] . The three factors in their model represent short- 
and long-term fluctuations as well as the behavior of the winter-summer spread. In 
contrast to these models, Stoll and Wiebauer [14] propose a fundamental model 
with temperature as an exogenous factor. They use the temperature component as an 
approximation of the filling level of gas storages, which have a remarkable influence 
on the price. 

There is a fundamental difference between the model of Stoll and Wiebauer [14] 
and the other models mentioned before as far as their stochastic behavior is concerned. 
Incorporating cumulated heating degree days over a winter as an explanatory variable 
leads to a seasonal effect in the variance of the prices. In this model the variance of the 
gas prices increases over the winter depending on the actual weather conditions and 
has a maximum at the end of winter. This is much more in line with the observations 
than the behavior of the model of Boogert and de Jong [6] where the variance of the 
gas price has a minimum at the end of winter as there is no effect of the winter-summer 
spread used there. Another major difference is the use of exogenous variables that 
can be observed and thus the optimal exercising decision for American- style options 
depends on these variables and therefore also the price of these real options will be 
different. 

In this paper we extend the model of Stoll and Wiebauer [14] by introducing 
another exogenous factor to their model: the oil price. There are at least two reasons 
why we believe that this is useful. The main reason is that an oil price component can 
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be considered as a proxy for the state of the world economy in the future. In contrast 
to other indicators, such as the gross domestic product (GDP), futures prices for oil 
are available on a daily basis. Furthermore, the import prices for gas in countries 
such as Germany are known to be oil price indexed. 

Apart from the GDP or oil price there might be more candidates as an explanatory 
variable in the model. The most natural choice would be the forward gas price. We 
prefer the oil price as it gives us the chance to valuate gas derivatives that are oil 
price indexed, as is often the case for gas swing contracts. For the valuation of such 
swing contracts gas price scenarios are needed as well as corresponding oil price 
scenarios. This application is hardly possible with explanatory variables other than 
the oil price. 

The rest of the paper is organized as follows. In Sect. 2 we introduce the model 
by Stoll and Wiebauer [14] including a short description of their model for the 
temperature component. In Sect. 3 we discuss the need for an oil price component in 
the model. The choice of the component in our model is explained. Then we fit the 
model to data in Sect. 4. The new model is used within a least squares Monte Carlo 
algorithm for valuation of gas storages and swing options in Sect. 5. The exogenous 
factors are included in the regression to approximate the continuation value. We 
finish with a short conclusion in Sect. 6. 


2 A Review of the Model by Stoll and Wiebauer (2010) 

Modeling the price of natural gas in Central Europe requires knowledge about the 
structure of supply and demand. On the supply side there are only a few sources in 
Central Europe, while most of the natural gas is imported from Norway and Russia. 
On the demand side there are mainly three classes of gas consumers: Households, 
industrial companies, and gas fired power plants. While households only use gas for 
heating purposes at low temperatures, industrial companies use gas as heating and 
process gas. Households and industrial companies are responsible for the major part 
of the total gas demand. 

These two groups of consumers cause seasonalities in the gas price: 

• Weekly seasonality: Many industrial companies need less gas on weekends as their 
operation is restricted to working days. 

• Yearly seasonality: Heating gas is needed mainly in winter at low temperatures. 

An adequate gas price model has to incorporate these seasonalities as well as sto- 
chastic deviations of these. 

Stoll and Wiebauer [14] propose a model meeting these requirements and incor- 
porating another major influence factor: the temperature. To a certain extent the 
temperature dependency is already covered by the deterministic yearly seasonality. 
This component describes the direct influence of temperature: The lower the tem- 
perature, the higher the price. But the temperature influence is more complex than 
this. A day with average temperature of 0°C at the end of a long cold winter has 
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a different impact on the price than a daily average of zero at the end of a “warm” 
winter. Similarly, a cold day at the end of a winter has a different impact on the price 
than a cold day at the beginning of the winter. 

The different impacts are due to gas storages that are essential to cover the demand 
in winter. The total demand for gas is higher than the capacities of the gas pipelines 
from Norway and Russia. Therefore, gas providers use gas storages. These storages 
are filled during summer (at low prices) and emptied in winter months. At the end of 
a long and cold winter most gas storages will be rather empty. Therefore, additional 
cold days will lead to comparatively higher prices than in a normal winter. 

The filling level of all gas storages in the market would be the adequate variable 
to model the gas price. However, these data are not available as they are private 
information. Therefore, we need a proxy variable for it. As the filling levels of gas 
storages are strongly related to the demand for gas which in turn depends on the 
temperature, an adequate variable can be derived from the temperature. 

Stoll and Wiebauer [14] use normalized cumulated heating degree days to cover 
the influence of temperature on the gas price. They define a temperature of 15 °C 
as the limit of heating. Any temperature below 15 °C makes households as well 
as companies switch on their heating systems. Heating degree days are measured 
by HDD t = max (15 — T t , 0), where T t is the average temperature of day t. As 
mentioned above the impact on the price depends on the number of cold days observed 
so far in the winter. In this context, we refer to winter as 1 October and the 181 
following days till end of March. We will write HDDd, w f° r HDD t , if t is day 
number d of winter w. Cumulation of heating degree days over a winter leads to 
a number indicating how cold the winter has been so far. Then we can define the 
cumulated heating degree days on the day d in winter w as 

d 

CHDD diW = ^HDD k , w fori < d < 182. (1) 

k=l 


The impact of cumulated heating degree days on the price depends on the comparison 
with a normal winter. This information is included in normalized cumulated heating 
degree days 


Ad,w — C H D Dd,w 


1 


w — 1 


w — 1 

^ CHDD d ,i for 1 < d < 182. 
l = 1 


( 2 ) 


We use A t instead of Ad, w for simplicity, if t is a day in a winter. The definition 
of A t for a summer day is described by a linear return to zero during summer. 
This reflects the fact that we use A t as a proxy variable for filling levels of gas 
storages. Assuming a constant filling rate during summer we thus get the linear 
part of normalized cumulated heating degree days (see Fig. 1). Positive values of A t 
describe winters colder than the average. A t is included into the gas price model by 
a regression approach. As the seasonal components and the normalized cumulated 
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Fig. 1 Normalized 
cumulated heating degree 
days calculated based on 
temperature data from 
Eindhoven, Netherlands, for 
2003-2011 (Source: Royal 
Netherlands Meteorological 
Institute) 



heating degree days are linear with respect to the parameters, we can use ordinary 
least squares regression for parameter estimation. The complete model can be written 
as 


G, = m, + a ■ A, + X f (G) + Y, (G) (3) 

with the day-ahead price of gas G t , the deterministic seasonality m t , the normal- 
ized cumulated heating degree days A t , an ARMA process X t , and a geometric 
Brownian motion Y t J . For model calibration day-ahead gas prices from TTF market 
(Source: ICE) are used. The Dutch gas trading hub TTF offers the highest trading vol- 
umes in Central Europe. As corresponding temperature data we choose daily average 
temperatures from Eindhoven, Netherlands (Source: Royal Netherlands Meteorolog- 
ical Institute). The fit to historical prices before the crisis can be seen in Fig. 2. Outliers 
have been removed (see Sect. 4 for details on treatment of outliers). 


3 The Oil Price Dependence of Gas Prices 

The model described in Eq. (3) is capable to cover all influences on the gas price 
related to changes in temperature. But changes in the economic situation are not 
covered by that model. This was clearly observable in the economic crisis 2008/2009 
(see Fig. 5). During that crisis the demand for gas by industrial companies in Central 
Europe was falling by more than 10%. As a consequence the gas price rapidly 
decreased by more than 10 Euro per MWh. 

The oil price showed a similar behavior in that period. Economic changes are 
the main drivers for remarkable changes in the oil price level. Short-term price 
movements caused by speculators or other effects cause deviations from the price 
level that represents the state of the world economy. Therefore, gas price changes 
often correspond to long-term changes in the oil price level. Such an influence can be 
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Fig. 2 m t + a • A t from Eq. (3) (black) fitted to TTF prices from 2004-2009 (grey) 


Mar 1 Apr 1 May 1 June 1 July 1 Aug 1 Sept 



Averaging period Time lag Validity period 


Fig. 3 In a 3 - 1 -3 formula the price is determined by the average price of 3 months (March to May). 
This price is valid for July-September. The next day of price fixing is 1 October 


modeled by means of a moving average of past oil prices. The averaging procedure 
removes short-term price movements if the averaging period is chosen sufficiently 
long. The result is a time series containing only the long-term trends of the oil 
price. Using such an oil price component in a gas price model explains the gas price 
movements due to changes in the economic situation. This consideration is in line 
with He et al. [8]. They identify cointegration between crude oil prices and a certain 
indicator of global economic activity. 

Another important argument for the use of this oil price component is based on 
Central European gas markets. Countries such as Germany import gas via long- 
term contracts that are oil price indexed. This indexation can be described by three 
parameters: 

1. The number of averaging months. The gas price is the average of past oil prices 
within a certain number of months. 

2. The time lag. Possibly, there is a time lag between the months the average is 
taken of and the months the price is valid for. 

3 . The number of validity months. The price is valid for a certain number of months. 

An example of a 3-1-3 formula is given in Fig. 3. 

The formulas used in the gas import contracts are not known to all market partici- 
pants. Theoretically, any choice of three natural numbers is possible. But from other 
products, like oil price indexed gas swing options, we know that some formulas are 
more popular than others. Examples of common formulas are 3-1-1, 3-1-3, 6-1-1, 
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Year 


Fig. 4 The oil price (grey), the 6-0-1 formula (black step function) and the moving average of 
180 days (black) 


6-1-3, and 6-3-3. Therefore, we assume that these formulas are relevant for import 
contracts as well. 

As there are many different import contracts with possibly different price formulas 
we cannot be sure that one of the mentioned formulas is responsible for the price 
behavior on the market. The mixture of different formulas might affect the price in 
the same way as a common formula or a similar one. 

Evaluation of the formula leads to price jumps every time the price is fixed. The 
impact on the gas price will be smoother, however. The new gas price determined 
on a fixing day is the result of averaging a number of past oil prices. The closer 
to the fixing day the more prices for the averaging are known. Therefore, market 
participants have reliable estimations of the new import price. If the new price would 
be higher it would be cheaper to buy gas in advance and store it. This increases the 
day-ahead price prior to the fixing day and leads to a smooth transition from the old 
to the new price level on the day-ahead market. 

This behavior of market participants leads to some smoothness of the price. In 
order to include this fact in a model a smoothed price formula can be used. A sophis- 
ticated smoothing approach for forward price curves is introduced by Benth et al. 
[3]. They assume some smoothness conditions in the knots between different price 
intervals. It is shown that splines of order four meet all these requirements and make 
sure that the result is a smooth curve. As our price formulas are step functions like 
forward price curves, this approach is applicable to our situation. 

If the number of validity months is equal to one it is possible to use a moving 
average instead of a (smoothed) step function to simplify matters (see Fig. 4). This 
alternative is much less complex than the approach with smoothing by splines, and 
delivers comparable results. Therefore, the simpler method is applied in case of 
formulas with one validity month. 
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In the next section we compare various formulas regarding their ability to explain 
the price behavior on the gas market. 


4 Model Calibration with Temperature and Oil Price 

We now compare different formulas of oil prices in the regression model in order to 
find the one explaining the gas price best (see Fig. 5). 

For the choice of the best formula we use the coefficient of determination R 2 
as the measure of goodness-of-fit. We choose the reasonable formula leading to the 
highest value of R 2 . Reasonable, in this context, means that we restrict our analysis 
to formulas that are equal or similar to the ones known from other oil price indexed 
products (compare Sect. 3). The result of this comparison is a 6-0-1 formula (see 
Fig. 6). Although this is not a common formula there is an explanation for it: The gas 
price decreased approximately six months later than the oil price in the crisis. This 
major price movement needs to be covered by the oil price component. As explained 
above we replace the step function by a moving average. Taking the moving average 
of 180 days is a good approximation of the 6-0-1 formula. All in all, the oil price 
component increases the R 2 as our measure of goodness-of-fit from 0.35 to 0.83 
(see Fig. 5). Even if the new model is applied to data before the crisis the oil price 
component is significant. In that period the increase of R 2 is smaller but still improves 
the model. 

These comparisons give evidence that both considerations in the previous section 
are valid. The included oil price component can be seen as the smoothed version 
of a certain formula. At the same time it can be considered as a variable describing 
economic influences indicated by the trends and level of the oil price. 

Therefore, we model the gas price by the new model 

Gt = ntt + oq At + OL^t + x\ ^ (4) 

with *R t being the oil price component. This means that the unobservable factor Y { 
in Stoll and Wiebauer [14] is replaced by the observable factor *R t . 

Parameter estimation of our model is based on the same data sources as the model 
by Stoll and Wiebauer [14] . However, we extend the period to 201 1 . Additionally, we 
need historical data for the estimation of the oil price component. Therefore, we use 
prices of the front month contracts of Brent crude oil traded on the Intercontinental 
Exchange (ICE) from 2002-2011. Using these data we can estimate all parameters 
applying ordinary least squares regression after removing outliers from the gas price 
data, G t . 

Outliers can be due to technical problems or a fire at a major gas storage. We 
exclude the prices on these occasions by an outlier treatment proposed by Weron [15], 
where values outside a range around a running median are declared to be outliers. 
The range is defined as three times the standard deviation. The identified outliers 
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Fig. 5 The model of Stoll and Wiebauer [14] ( bold black) and our model ( thin black line) fitted to 
historical gas prices (grey) 



Year 

Fig. 6 Comparison of different oil price components in the model: 6-0-1 formula (bold black), 
6-1-1 formula (grey) and 3-0-1 formula (thin black) fitted to the historical prices (dark grey) 


are excluded in the regression. We do not remove them from our model, however, as 
they are still included in the estimation of the parameters of the remaining stochastic 
process. 

Altogether, these model components give fundamental explanations for the his- 
torical day-ahead price behavior. Short-term deviations are included by a stochastic 
process (see Sect. 4.3). Long-term uncertainty due to the uncertain development of 
the oil price is included by the oil price process. Therefore, our model is able to 
generate reasonable scenarios for the future (see Fig. 7). We specify the stochastic 
models for the exogenous factors & t and A t as well as the stochastic process X ) 
in the following. 
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Fig. 7 The historical gas price (2008-2012) and its extensions by two realizations of the gas price 
process for 2012-2013 




4.1 Oil Price Model 


Oil prices show a different behavior than gas prices, which influences the choice 
of an adequate model. The most obvious fact is the absence of any seasonalities or 
deterministic components. Therefore, we model the oil price without a deterministic 
function or fundamental component. Another major difference affects the stochastic 
process. While the oil price and also logarithmic oil prices are not stationary the gas 
price is stationary after removal of seasonalities and fundamental components. 

A very common model for nonstationary time series is the Brownian motion 
with drift applied to logarithmic prices. Drift and volatility of this process can be 
determined using historical data or by any estimation of the future volatility. For a sta- 
tionary process, the use of an Ornstein-Uhlenbeck process or its discrete equivalent, 
an AR(1) process, is an appropriate simple model. 

A combination of these two simple modeling approaches is given by the two- 
factor model by Schwartz and Smith [13]. They divide the log price into two factors: 
one for short-term variations and one for long-term dynamics. 

ipt = exp (xt + 6) (5) 

with an AR(1) process Xt (short-term variations) and a Brownian motion £* (long- 
term dynamics). These processes are correlated. We apply this two-factor model as it 
considers long- and short-term variations. The estimation of parameters in this model 
is more complex. The factors are not observable on the market. Following the paper 
by Schwartz and Smith [13] we apply the Kalman filter for parameter estimation. 

The resulting process (ip t ) is used to derive the process QP t ) in Eq. (4). 


Modeling the Price of Natural Gas with Temperature . . . 


119 





Lags in days 


Lags in days 


Fig. 8 Top : Fit of deterministic function {black) to the historical daily average temperature {grey) in 
Eindhoven, Netherlands. Bottom : Autocorrelation function {left) and partial autocorrelation function 
{right) of residual time series {black) and innovations of AR(3) process {grey) 


4.2 Temperature Model 

When modeling daily average temperature we can make use of a long history of 
temperature data. Here, a yearly seasonality and a linear trend can be identified. 
Therefore, we use a temperature model closely related to the one proposed by Benth 
and Benth [2]. 


/ Z7 ™ \ / \ (T) 

T, = «, + a 2 , + « an (5^ j + «4 cos (5^ j + X, (6) 

(T) 

with Xf being an AR(3) process. The model fit with respect to the deterministic 
part (ordinary least squares regression) and the AR(3) process is shown in Fig. 8. The 
process (T t ) (see Fig. 9) is then used to define the derived process (A t ) of normalized 
cumulated heating degree days as described in Sect. 2. 


4.3 The Residual Stochastic Process 

The fit of normalized cumulated heating degree days, oil price component, and deter- 
ministic components to the gas price via ordinary least squares regression (see Fig. 10) 
results in a residual time series. These residuals contain all unexplained, “random” 
deviations from the usual price behavior. 
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Month 

Fig. 9 Historical temperatures and its extension by two realizations of the temperature model 



Year 

Fig. 10 Fit of deterministic function and exogenous components (black) to the historical gas price 
(grey) 


The residuals exhibit a strong autocorrelation to the first lag. Further analysis 
of the partial autocorrelation function reveal an ARMA(1,2) process providing a 
good fit (see Fig. 11). The empirical innovations of the process show heavier tails 
than a normal distribution (compare Stoll and Wiebauer [14]). Therefore, we apply 
a distribution with heavy tails. The class of generalized hyperbolic distributions 
including the NIG distribution was introduced by Barndorff-Nielsen [1]. The normal- 
inverse Gaussian (NIG) distribution leads to a remarkably good fit (see Fig. 11). 

Both the distribution of the innovations and the parameters of autoregressive 
processes are estimated using maximum likelihood estimation. 
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Fig. 11 Top : ACF (left) and PACF (right) of residual time series (black) and innovations of 
ARMA(1,2) process (grey). Bottom'. Fit of NIG distribution (grey) to kernel density of empiri- 
cal innovations (black) 


5 Option Valuation by Least Squares Monte Carlo Including 
Exogenous Components 


An optimal exercise of flexibility like gas storages as well as swing options is a 
decision under uncertainty. While the price for the next day is known, the future 
development of the spot prices is uncertain. Nevertheless, gas withdrawn today cannot 
be withdrawn on a day in the future at a possibly higher price level. The identification 
of an optimal trading strategy under this uncertainty is a typical problem of stochastic 
dynamic programming, and simulation-based approximation algorithms have been 
successfully applied in this area. Longstaff and Schwartz [9] introduced the least 
squares Monte Carlo method for the valuation of American options, Meinshausen 
and Hambly [10] extended the idea to swing options and Boogert and de Jong [5] 
applied it to the valuation of gas storages. Furthermore, Boogert and de Jong [6] found 
that the different components of the gas price process should be included into the 
regression of the least squares Monte Carlo method for the valuation of gas storages 
as this increases the value. While they included components that are not observable 
but virtual components of their price process, the price process introduced in Sect. 4 
of this paper includes two exogenous and at the same time observable components. 
The normalized cumulated heating degree days as well as the 180 days average of the 
oil price can directly be observed and easily included into the exercise decision of the 
option that has to be done on a daily basis by a trader. The least squares Monte Carlo 
method including further factors is described in Sect. 5.1 and valuation examples are 
given in Sect. 5.2. 
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5.1 Extensions of Least Squares Monte Carlo Algorithm 
Including Exogenous Components 


A gas storage is characterized by the following restrictions: 

• The filling level must lie between given minimum and maximum volumes at any 
times 0 <t <T + 1: w m i n (r) < v(t) < u max 0) 

• For each day volume changes are limited by withdrawal and injection rate: 
An m in 0, v(t )) < Av < A ^max (L V(t)) 

From a mathematical point of view a swing option is a special case of a gas storage. 

During the delivery period a daily nomination of the gas delivery for the next day is 

done, while the following restrictions apply: 

• Daily contract quantity (DCQ): minimum as well as maximum daily volume; 
typical values are DCQmin 50-90% and DCQmax 100-110% of a given DCQ 
reference (where DCQ = ACQ/365) 

• Annual contract quantity (ACQ): minimum as well as maximum yearly volume; 
typical values are ACQmin 80-90% and ACQmax 100-110% of a given ACQ 
reference 

Due to these restrictions, a swing option is the same as a storage with an initial 

volume equal to the ACQmax of the swing 

^min(O) = n max (0) = ACQmax (7) 


and the following restriction for the final volume 

0 = Umin (T + 1) < v(T + 1) < n max (T + 1) = ACQmin. (8) 

where only withdrawal is possible 

—DCQmax = An m i n (f, v(t)) < Av < An max (f, v(t)) = —DCQmin. (9) 


We assume that the storage is available from time t = 0 till time t = T + 1 and 
the holder is allowed to take an action at any discrete date t = 1 , . . . , T after the spot 
price S(t) is known. Let v(t) denote the volume in storage at the start of day t and 
Av the volume change during day t. In case of an injection Av > 0, while Av < 0 
means withdrawal from the storage. The payoff on day t is 


h(G t ,Av ) = 


{—G t — cwd,i ) • An, 
(~G t — c iN,t ) • An, 


Av > 0 
Av < 0 


( 10 ) 


Here cwD,t denotes the withdrawal costs and c/a^ the injection costs on day t, which 
can be different and may include a bid-ask spread. 
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Let U(t,G t , v(t)) be the value of the flexibility starting at volume level v(t) 
at time t. By C(t, G t , v(t), Av) we denote the continuation value after taking an 
allowed action Av from U(t, v(t)) (the set of all admissible actions at time t if the 
filling level is v(t)). If r(t ) is the interest rate at time t then 

C(t, G,, v(t), Av) = E \e~ r{t+l) U(t + 1, G t+U v(t) + av)] . (11) 

The continuation value only depends on v(t + l) := v(t) + Av . Therefore, we will 
from now on also write C(t, G t , v(t + 1)) for short. With this notation the flexibility 
value U (t, G t , v(t)) satisfies the following dynamic program: 

U(T + 1, Gr+i, v(T + 1)) = q{G T+ 1 , v(T + 1)) (12) 

U (t, G t , v(t)) = max [h(G t , Av) + C(t, G t , v(t), Av)] 
Ave'D(t,v(t )) 

for all times t. In the first equation q is a possible penalty depending on the volume 
level at time T + 1 and the spot price at this time Gr+i . 

As the continuation value cannot be determined analytically, we use the least 
squares Monte Carlo method to approximate the continuation value 


m 

C(t, G t , v(t + 1)) ~ £/?/,,(»>(* + 1)) • MGt) (13) 

1=0 

using basis functions 0/. If N price scenarios are given, estimates + 1)) 

for the coefficients /3i jt (v(t + 1)) result by regression. With these coefficients an 
approximation C(t, G t , v(t + 1)) of the continuation value is obtained that is used 
to determine an approximately optimal action Av(t) for all volumes v(t). 

Moreno and Navas [11] have shown that the concrete choice of the basis functions 
does not have much influence on the results. For this reason we have chosen the easy 
to handle polynomial basis functions <fii(G t ) = G l t . Calculations have shown that 
m = 3 is enough to get good results. A higher number of basis functions leads to 
similar results. 

Boogert and de Jong [6] use a multi-factor price process and include the factors 
of the price process into the basis used for the regression in the least squares Monte 
Carlo method. While their factors are unobservable, our price process (see Eq. (4)) 
includes two exogenous factors, which can easily be observed. We include the oil 
price component ^ (see Sect. 3) and the temperature component A t (see Sect. 2) 
into the regression by using 
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C(t, G t , A t , v(t), Av) « 

1=0 

T Pm+l,t*Pt T" @m+3,t*Pt ' Gt 

+ Pm+4,tA t + An+5,f^f • G r . (14) 

For simplification of notation we omit to mention the explicit dependence of the 
parameters on the filling level v (t + 1) as is done in Boogert and de Jong [6] throughout 
the paper. Monomials of higher degree in the oil price or temperature components as 
well as higher mixed terms have also been examined, but do not yield better results. 


5.2 Influence of Exogenous Components on Valuation Results 


Gas storages and swing options are not only virtual products but are real options. This 
means that traders need to take exercise decisions on a daily basis. These decisions 
depend on all observable market information. In order to reflect this behavior in the 
pricing algorithm for such options we will use the least squares Monte Carlo method 
described above in combination with the spot price model in Sect. 4. The examples 
given in this section are artificial gas storages and swing options valuated at two 
different dates, 4 July 2012 and 2 April 2013. These dates are characterized by a 
very different implicit volatility observed at the markets — for example for TTF the 
long-term volatility has significantly decreased in the 8-month period from 25 to 12 % 
(Source: ICE). At the same time the summer-winter spread between winter 13/14 
and summer 13 has decreased from 2.40 EUR/MWh to 1.20 EUR/MWh, whereas 
the price level has increased from 26.15 EUR/MWh to 27.70 EUR/MWh. 

The TTF market prices have been used for the valuation of a slow and a fast 
storage that are identical to the ones valued by Boogert and de Jong [6]. Moreover, 
we have also valued a flexible and an inflexible swing contract. The parameters for 
these storages and swings are given in Table 1 . All valuations have been done using 
5,000 price scenarios, which results in sufficiently convergent results. 

We denote by daily intrinsic the value obtained if a daily price forward curve is 
taken and an optimal exercise is calculated (using a deterministic dynamic program). 
This value could be logged in immediately if each single future day could be traded 
as an individual forward contract. The fair value denotes the value resulting from the 
least squares Monte Carlo method, and the extrinsic value is the difference between 
fair value and daily intrinsic value. Therefore, the extrinsic value is a measure for 
the value of the flexibility included in the considered real option. 

As can clearly be seen by comparing Tables 2 and 3 the decrease of the summer- 
winter spread results in a lower daily intrinsic value for the storages. In contrast to 
this behavior the intrinsic value of the flexible swing increases because of the higher 
price level in 2013 compared to 2012. Furthermore, the decrease of volatility does 
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Table 1 Parameters for gas storages and swing options from 1.4.2013-1.4.2014 


Parameter 

Slow storage 

Fast storage 

Inflexible swing 

Flexible swing 

Min volume 

OMWh 

OMWh 

OMWh 

OMWh 

Max volume 

lOOMWh 

lOOMWh 

438 MWh 

438 MWh 

Min injection 

0 MWh/day 

0 MWh/day 

- 

- 

Max injection 

1 MWh/day 

2 MWh/day 

- 

- 

Min withdrawal 

0 MWh/day 

0 MWh/day 

0.6 MWh/day 

0 MWh/day 

Max withdrawal 

1 MWh/day 

5 MWh/day 

1.2 MWh/day 

1.2 MWh/day 

Injection costs 

OEUR/MWh 

OEUR/MWh 

- 

- 

Withdrawal costs 

OEUR/MWh 

OEUR/MWh 

27 EUR/MWh 

27 EUR/MWh 

Start volume 

OMWh 

OMWh 

438 MWh 

438 MWh 

Max end volume 

OMWh 

OMWh 

146 MWh 

146 MWh 


Table 2 Results for valuation date 4 July 2012 (5,000 scenarios) 


Contract 

Factors in regression 

Daily intrinsic 

Fair value 

Extrinsic value 

Slow storage 

Spot 

360.8 

382.4 

21.6 

Spot & Brent 

360.8 

549.5 

188.7 

Spot & Brent & HDD 

360.8 

571.2 

210.4 

Fast storage 

Spot 

517.1 

561.8 

44.7 

Spot & Brent 

517.1 

1,006.6 

489.5 

Spot & Brent & HDD 

517.1 

1,090.1 

572.9 

Inflexible swing 

Spot 

-126.2 

274.5 

400.7 

Spot & Brent 

-126.2 

285.4 

411.6 

Spot & Brent & HDD 

-126.2 

286.3 

412.4 

Flexible swing 

Spot 

-41.6 

356.5 

398.1 

Spot &Brent 

-41.6 

397.2 

438.8 

Spot &Brent &HDD 

-41.6 

959.6 

1,001.2 


not change the extrinsic value of the two storages — very much in contrast to the 
swings. 

For storages these findings correspond very well to the observations by Boogert 
and de Jong [6] . They also found that a change of volatility in the long-term compo- 
nent does not influence the value of gas storages — it may even decrease the value. 
An explanation for this behavior is that it becomes more difficult for traders to decide 
correctly if today’s price is high or low and therefore withdrawal, injection, or no 
action makes most sense. Due to the decision under uncertainty about the future price 
development with an increased volatility, more and more wrong decisions are taken 
and this may decrease the value at least in case of fast storages. 

The situation is completely different for swing options. With an increasing volatil- 
ity their value also increases. This is not surprising as can easily be seen from looking 
at a special case. If the yearly restriction is not binding the swing is equivalent to a 
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Table 3 Results for valuation date 2, April 2013 (5,000 scenarios) 


Contract 

Factors in regression 

Daily intrinsic 

Fair value 

Extrinsic value 

Slow storage 

Spot 

227.3 

309.5 

82.2 

Spot & Brent 

227.3 

419.1 

191.8 

Spot & Brent & HDD 

227.3 

411.7 

184.4 

Fast storage 

Spot 

353.5 

593.4 

240.0 

Spot & Brent 

353.5 

855.0 

501.6 

Spot & Brent & HDD 

353.5 

877.0 

523.5 

Inflexible swing 

Spot 

310.0 

485.2 

175.2 

Spot & Brent 

310.0 

488.0 

177.9 

Spot & Brent & HDD 

310.0 

471.9 

161.9 

Flexible swing 

Spot 

324.1 

542.1 

218.0 

Spot & Brent 

324.1 

558.5 

234.4 

Spot & Brent & HDD 

324.1 

572.2 

248.1 


strip of European options. In this case it is well known that an increase of volatility 
implies an increase of the extrinsic option value under quite general assumptions on 
the underlying stochastic process, see e.g. Bergenthum and Riischendorf [4]. 

Another important difference between swings and storages is their behavior if 
the exogenous components of the spot price process are included in the regression 
of the algorithm. For the value of storages the oil price component is much more 
important — in contrast to swings. For the inflexible swing both components are 
irrelevant, while for the flexible swing the temperature component is more important 
than the oil price component. For storages the oil price component is a measure for 
normal long-term levels. As prices revert back to this long-term level mainly defined 
by the oil price component, a price higher than this level is good for withdrawal 
while a price lower than this level is good for injection. Therefore, an inclusion in 
the regression is very important for the exercise decision and increases the value. 

Another interesting observation is the influence of the two exogenous components 
on the less flexible products. While an inclusion of the oil price component increases 
the fair value, a further inclusion of the temperature component decreases the value 
slightly for valuation date 2 April 2013 — but not for 4 July 2012. One important 
reason is that in April 2013 the end of a long and as far as heating degrees are 
concerned quite normal winter has just been exceeded and the linear return to zero is 
starting, while the winter 201 1/12 has been very warm and in July the linear return 
with a slight gradient has half been finished. 

To sum up, these results indicate that it is very important to include both exogenous 
components into the exercise decision for storages as well as swings, as this can 
significantly increase the extrinsic value. 
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6 Conclusion 

The spot price model by Stoll and Wiebauer [14] with only temperature as an exoge- 
nous factor is not able to explain the gas price behavior during the last years. We 
have shown that adding an oil price component as another exogenous factor remark- 
ably improves the model fit. It is not only a good proxy for economic influences on 
the price but also approximates the oil price indexation in gas import contracts on 
Central European gas markets. These fundamental reasons and the improvement of 
model fit give justification for the inclusion of the model component. The resulting 
simulation paths from the model are reliable. The inclusion of both exogenous factors 
in algorithms for the valuation of options by least squares Monte Carlo remarkably 
affects the valuation results. 

Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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Copula-Specific Credit Portfolio Modeling 

How the Sector Copula Affects the Tail of the Portfolio 
Loss Distribution 


Matthias Fischer and Kevin Jakob 


Abstract Traditionally, banks estimate their economic capital which has to be 
reserved for unexpected credit losses with individual credit portfolio models. Many 
of those have its roots in the CreditRisk + or in the CreditMetrics framework, which 
were both launched in 1997. Motivated by the current regulatory requirements, banks 
are required to analyze how sensitive their models (and the resulting risk figures) are 
with respect to the underlying assumptions. Within this context, we concentrate 
on the dependence structure in terms of copulas in both frameworks. By replacing 
the underlying copula and using other popular competitors instead, we quantify the 
effect on the tail, in general, and on the risk figures in specific for a hypothetical loan 
portfolio. 


1 Introduction 


After the market crash of October 1987, Value-at-Risk (VaR) became a popular 
management tool in financial firms. Practitioners and policy makers have invested 
individually in implementing and exploring a variety of new models. However, as a 
consequence of the financial markets turmoil around 2007/2008, the concept of VaR 
was exposed to fierce debates. But just a few years after the crisis, VaR is still being 
used albeit with greater awareness of its limitations (model risk) or in combination 
with scenario analysis or stress testing. In particular, banks are required to critically 
analyze and validate their employed VaR models which form the basis for their 
internal capital allocation process (ICAAP, see BaFin [1, AT.4.1]). In this context, 
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the term “model validation” should be associated to the activity of assessing if the 
assumptions of the model are valid. Model assumptions, not computational errors, 
were the focus of the most common criticisms of quantitative models in the crisis. In 
particular, banks should be aware of the errors that can be made in the assumptions 
underlying their models which form one of the crucial parts of model risk, probably 
underestimated in the past practice of model risk management. With respect to the 
current regulatory requirements (see, e.g., BaFin [1] or Board of Governors of the 
Federal Reserve System [2]), banks are also required to quantify how sensitive their 
models and the resulting risk figures are if fundamental assumptions are modified. 

The focus of this contribution is solely on credit risk as one of the most important 
risk types in the classical banking industry. Typically, the amount of economic capital 
which has to be reserved for credit risk is determined with a credit portfolio model. 
Two of the most widespread models are CreditMetrics, launched by JP Morgan 
(see Gupton et al. [3]) and CreditRisk + , an actuarial approach proposed by Credit 
Suisse Financial Products (CSFP, see Wilde [4]). Shortly after their publication, 
Koylouglu and Hickman [5], Crouhy [6] or Gordy [7] offered a comparative anatomy 
of both models and described quite precisely where the models differ in functional 
form, distributional assumptions, and reliance on approximation formulae. Sector 
dependence, however, was not in the focus of these studies. 

A crucial issue with credit portfolio models consists in the realistic modeling 
of dependencies between counterparties. Typically, all counterparties are assigned 
to one or more (industry/country) sectors. Consequently, high-dimensional counter- 
party dependence can be reduced to low(er)-dimensional sector dependence, which 
describes the way how sector variables are coupled together. Against this background, 
our focus is on the impact of different dependence structures represented in terms of 
copulas within credit portfolio models. Relating to Jakob and Fischer [8], we extend 
the analysis of the CreditRisk + model to CreditMetrics and provide comparisons 
between both frameworks. For this purpose, we work out the implicit and explicit 
sector copula of both classes in a first step and quantify the effect of exchanging the 
copula model on the risk figures for a hypothetical loan portfolio and a variety of 
recent flexible parametric copulas in a second step. 

Therefore, the outline is as follows. In Sect. 2, we review the classical copula con- 
cept and briefly introduce those copulas which are used during the analysis. Section 3 
summarizes and compares the underlying credit portfolio models with special empha- 
sis on the underlying sector dependence. Finally, we empirically demonstrate the 
influence of different copula models on the upper tail of the loss distribution and, 
hence, on the risk figures for a hypothetical but realistic loan portfolio. Section 5 
concludes. 
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2 Copulas Under Consideration 

The concept of copulas dates back to Sklar [9]. In general, a copula is a 
multivariate distribution function on the ^/-dimensional unit hypercube with uniform 
one-dimensional margins. 1 With the help of a copula function, one can decompose 
an arbitrary multivariate distribution into its margins and the dependence structure, 
i.e., according to Sklar’s Theorem, for any multivariate distribution function F on 
W i with univariate margins Fi a unique function C : x^ =1 Im (T)) —> [0, 1] exists, 
such that F (x) = C (F\(x i), . . . , F c i(x c i)) for all x e M. d . Conversely, for arbi- 
trary univariate distribution functions F( and a copula C, the function F defines a 
valid multivariate distribution function. Because our focus is solely on the depen- 
dence structure between economic sectors, we will use Sklar’s theorem in the second 
direction. By exchanging the copula, we can construct new multivariate distributions 
without affecting the margins. 

Already at the beginning of this century, Li [12] incorporated the concept of 
copulas into the CreditMetrics model. Ebmeyer et al. [13] used a Gaussian and a 
t-copula within the CreditRisk + framework to model sector dependencies. Our aim 
is to extend these studies to a broader range of copulas and to establish a comparison 
between both portfolio models regarding the sensitivity of the risk figures with respect 
to the sector dependence. In addition to the original dependence structures, i.e., the 
Gaussian copula (CreditMetrics) and a specific factor copula (CreditRisk + ), we apply 
the following parametric competitors: 

• elliptical copulas, i.e., the Gaussian copula (GC) and the t-copula (TC) (see, 
McNeil et al. [14]), 

• generalized hyperbolic copulas (GHC), implicitly defined by the family of gen- 
eralized hyperbolic distributions (see Barndorff-Nielsen [15]), 

• Archimedean (AC), for example the Gumbel, Clayton, Joe or Frank copula and 
hierarchical Archimedean copulas (HAC) (see Savu and Trede [16], McNeil 
[17] or Hofert and Scherer [18]), 

• pair copula constructions (PCC) (see Aas et al. [19]). 

To estimate the unknown parameters, e.g., the dispersion matrix in case of the GC, 
we use the maximum likelihood (ML) approach. Other techniques, e.g., inverting 
Kendall’s r may be also possible. In case of the HAC and PCC, one also has to 
choose a suitable nesting or vine structure, 2 respectively. For this purpose, we applied 
the methods implemented in the R-packages “HAC” by Okhrin and Ristig [20] and 
“VineCopula” by Schepsmeier et al. [21], respectively. Further information about 
the estimation are given in Sect. 4.3. In addition, for more details about the model 
selection process we also refer to the mentioned articles. 


1 In general, we assume that the reader is already familiar with the concept of copulas as well as 
the most popular classes. For details, we refer to Joe [10] and Nelson [11]. 

2 A vine is a directed acyclic graph, representing the decomposition sequence of a multivariate 
density function. 
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3 A Comparison Between CreditRisk + and CreditMetrics 

Within this section, we shortly introduce both CreditMetrics and CreditRisk+ in a 
comparative way to highlight the differences. 


3.1 Preliminary Notes and General Remarks 

CreditMetrics was developed by a group of investment banks, led by J.P. Morgan 
(see Gupton et al. [3]). It follows a mark to market approach and includes default 
risk as well as migration risk. 3 In order to ensure comparability across both models, 
we solely focus on the default risk. Nevertheless, in practice, migration risk is also 
very important and should not be neglected. CreditMetrics belongs to the class of 
threshold models (see McNeil et al. [14]). Here, the creditworthiness of each obligor 
is governed by a latent variable, which is driven by the state of the overall economy 
or a special sector/region as well as by an idiosyncratic factor. A default occurs if a 
predefined threshold, determined by the obligors’ initial probability of default (PD), 
is exceeded. 

In contrast, CreditRisk + belongs to the class of actuarial models. It was developed 
by the Financial Products division of Credit Suisse (see Wilde [4]). The default 
distribution of each counterparty is influenced by one or several factors. As in case 
of CreditMetrics, these factors depend on the current state of the economy as well 
as on idiosyncratic components. Given these values, defaults are assumed to be 
independent of each other. 

A major difference between both models is the way how the portfolio loss distrib- 
ution is achieved. Whereas in the CreditMetrics framework a Monte Carlo simulation 
is required to estimate the later, the same can be calculated analytically within the 
CreditRisk+ framework. A numerically stable algorithm is described in Gundlach 
and Lehrbass [22, Chap. 5]. 

3.2 Theoretical Background 

3.2.1 Model Input 

We assume that for each counterparty i = 1, . . . , N the exposure at default (EAD/), 
the loss given default (LGD /) and the (unconditional) probability of default (PD/) 
are known and not stochastic. We also assume that all business transactions of the 
obligors have been aggregated to a single position for each counterparty. To derive the 
loss distribution analytically, CreditRisk+ requires the exposures to be discretized 
with respect to a so-called loss unit U > 0. The original values for EAD/ and PD/ 


3 Migration risk includes the financial risk due to a change of the portfolio value caused by rating 
migrations (i.e., down- and upgrade). 
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are replaced by 


EAD i := max 


EAD i • LGD/ 
U 


and PD/ 


EAD / • LGD / • PD/ 
EAIV U 


respectively. The adjustment of the PDs ensures that the expected loss of the portfolio 
is not affected by the discretizastion. i.e., it holds: 


N N 

E(L) = ^ EAD, • LGD, • PD, = ^ EAD, • U ■ PD, = E (Z) . 
i = 1 i = 1 

To simplify notation, we will omit the tilde for the discretized exposure and the PD 
in the following and denote them also with EAD/ and PD/ , respectively. Since the 
CreditMetrics model is a simulative one, such an adjustment is not necessary. 


3.2.2 Sector Variables and Sector Dependencies 

In order to introduce dependencies between counterparties, every obligor is mapped 
to one or several out of K sectors. Since the interpretations and assumptions behind 
the sectors variables and the corresponding counterparty specific sector weights are 
different, we will use an individual notation for each model. In CreditMetrics, the 
vector of sector variables X = (X \, . . . , Xk) t is assumed to follow a multivariate 
normal distribution. Therefore, each sector variable Xk=i has a standard normal 
law and the copula of X = (X\, ... , Xk) T is a Gaussia one with dispersion matrixiJ. 

Within CreditRisk + , the sector variables Sk are assumed to follow a Gamma law 
with specific shape and scale parameters, such that E (Sk) = 1 for all k = 1 , . . . , K. 
The choice of the Gamma distribution was motivated by the fact that in combination 
with Poisson distributed defaults, the loss distribution can be derived analytically. 
In order to specify the sector distributions, the sector variances cr£ can be estimated 
from empirical data, for example, insolvency rates. In the original model of 1997, the 
variables Sk are also assumed to be independent of each other. In contrast, we apply 
the so-called CB V approach, which is an extension, published by Fischer and Dietz 
[23], with respect to correlated sectors. Here, each single sector variable is driven by 
a linear combination of L + 1 independent Gamma distributed variates, i.e., 

L 

= + ioxk = , K (1) 

£=1 

with non-negative weights yk,i for k = 1, . . . , K and i = 1, . . . , L. The vector 

S := , . . . , 57^ , with Si ~ T (§£, 1^, is called common-background- vector 

(CBV). Besides this vector, each sector variable is also affected by an individual 
component Sk ~ r (i 9k , Sk). Because all variables Sk and Si are assumed to be 
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independent of each other, one can reduce the CB V extension to the basic CreditRisk + 
model. Hence, also the CB V model can be solved analytically, too. For further details 
on the estimation of the Gamma parameters, we refer to Fischer and Dietz [23]. 

In Eq. (1), the marginal distributions of 5 = (Si, . . . , Sk) T are (in general) not 
Gamma anymore. An analysis of the resulting univariate distribution was established 
by Moschopoulos [24]. The copula of S is called a multi factor copula, which is 
discussed by Oh and Patton [25] in a very general way or Mai and Scherer [26]. 

3.2.3 Default Mechanism 


In the CreditMetrics setting, a default occurs if obligor i ’s creditworthiness, 4 modeled 
by 


falls below 0 _1 (PD/), where 0 _1 denotes the quantile function of the standard 
normal distribution and T/ ~ N (0, 1 ) is independent from X and Yj for i ^ j. The 
vector Rf e [— 1 , 1 ]^, with the restriction that Rj ERi < 1 , contains the so-called 
factor loadings, describing the correlation between a counterparty’s asset value A/ 
and the systemic factors X^. Given a sector realization x of X, the conditional PD, 
derived from the asset process (2) reads as 


In the CreditRisk + model, the sector variables Sk are assumed to influence the con- 
ditional PD according to 


with Wi e [0, 1]^ and W/,0 •= Xf=i Wi.fc < 1. Equations (3) and (4) establish a 
connection between sector variables and counterparties PDs. In CreditRisk + , PDp R+ 
serves as intensity parameter of a Poisson distribution from which defaults are drawn 
independently for every counterparty. The Poisson distribution is used instead of a 
Bernoulli one in order to obtain a closed form expression of the loss distribution. 
Therefore, also multiple defaults of counterparties (especially with bad creditworthi- 
ness) are possible. This is a major drawback of the model, leading to an overestimation 
of the risk figures. In Sect. 4 we analyze the changes of risk figures with respect to 
the underlying copula. But since our focus is on relative changes, this overestimation 
does not influence the comparison. 


4 One should note, that A/ again has a standard Gaussian law. The dependence structure is described 
by a multi factor copula as in case of the CreditRisk + - CBV model, but with a different parame- 
trization. 



( 2 ) 


PDf M (X =x)=4> (V 1 (PD,) - Rf x) / /l . (3) 


/ 



(4) 
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4 Results on Estimated Copulas and Risk Figures 


In this section the estimation results for the sector copulas are presented as well as 
the effect on economic capital. 


4.1 Portfolio and Model Calibration 


Consider a hypothetical portfolio consisting of 5,000 counterparties, each mapped 
to exactly one 5 out of ten industrial sectors. For reasons of simplicity, LGDs 6 are 
assumed to be deterministic and independent from PD. Since the absolute expo- 
sure values are chosen arbitrarily, we can assume that w.l.o.g LGD; = 1 for all 
i = 1,. . .,5,000. Because our focus is only on the relative changes of the risk figures 
rather than absolute values, this simplification does not restrict our results. Table 1 
summarizes the number of counterparties (#CP) and exposures by industrial sectors, 
as well as the estimated sector parameters related to the marginal sector distribu- 
tions. Although the portfolio itself is hypothetical, the distribution of exposure and 
counterparties across sectors might be characteristic for certain banks. Please note, 
that in case of CreditMetrics higher values of indicate a stronger dependency to 
systemic factors, leading to a higher risk for the specific sectors. In the CBV model 


Table 1 Number of counterparties, percentage of exposures, factor loadings [Rf CreditMetrics) 
and sector variances (erf CreditRisk + ) by industrial sector 


k 

Sector 

Portfolio characteristics 

Sector parameters 



#CP 

EAD (%) 

R l 

°k 

1 

Basic materials 

16 

1.7 

0.070 

0.42 

2 

Communication 

5 

2.5 

0.045 

0.29 

3 

Cyclical consumer goods 

4,631 

19.5 

0.058 

0.36 

4 

Noncyclical consumer goods 

15 

1.5 

0.048 

0.27 

5 

Diversified companies 

28 

3 

0.040 

0.19 

6 

Energy 

10 

4.3 

0.075 

0.40 

7 

Finance 

146 

45.9 

0.050 

0.46 

8 

Industry 

75 

11.1 

0.050 

0.30 

9 

Technology 

19 

1.8 

0.046 

0.26 

10 

Utilities 

55 

8.7 

0.082 

0.72 


5 Assigning an obligor to more than one sector would cause serious problems in the CreditMetrics 
framework, since, in general, the distribution of the asset value (2) is unknown if the copula of X 
is not Gaussian. 

6 For readers who are interested in the effect of stochastic LGDs, we refer to Gundlach and Lehrbass 
[22, Sect. 7] or Altman [27]. 
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a ^ represent the uncertainty about possible PD changes within the sector. Therefore, 
the risk related to a particular sector increases with cr^. 

The basis for the parameter estimation is a data pool containing monthly obser- 
vations (PD estimations) from 2003 to 2012 for more than 30,000 exchange traded 
corporates from all over the world. The individual PD time series, derived from mar- 
ket data (equity prices and liabilities) via a Merton model (see Merton [28]), are 
aggregated on sector level via averaging. In order to take time dependencies into 
account, we fitted a univariate autoregressive process to every sector time series. 


4.2 Parametrization of Marginal Distributions 


In order to fully determine the marginal distributions, we have to specify the sector 
variances cr£ for the CreditRisk + and the asset correlations for the CreditMetrics 
model. 7 The sector variances are estimated based on the autocovariance function of 
the aggregated sector time series mentioned above, which are normalized such that 
E(Sfc) = 1 holds, in order to ensure that the mean of the conditional PD (Eq. (4)) 
equals the unconditional PD. In case of the CreditMetrics model, the asset correlation 
parameters are estimated via a moment matching approach, such that the first 
two moments of the conditional PD in both models coincide. 8 Note, that the PD 
variance Var (PDp M (X)) induced by Eq. (3) of counterparty i in sector k is given by 
(0 -1 (PD/) , 0 -1 (PD/) , Rf) whereas, in case of CreditRisk + ,Var (PDp R+ (5)) 
is simply PDfcr^. Hence, for k = 1, . . . , K the parameter R^ is chosen such that 

*2 (-T 1 (PD k) , <T 1 (PD*) , 4) = crfFDl, 

where PD& denotes the mean of the time series for sector k and 4>2 is the distribution 
function of the bivariate normal distribution with correlation parameter R%. 


4.3 Estimation of Copulas 


First note that the estimations are based on the residuals of the autoregressive 
processes, fitted on every sector PD time series. For a more detailed discussion 
on this topic, we refer to Jakob and Fischer [8], for instance. 


7 In practice, the parametrization of both models are very different. The parameters of the 
CreditRisk + model are typically estimated based on default data or insolvency rates, whereas 
in case of the CreditMetrics model marked data are used. Using PD time series based on marked 
data might serve as a compromise in order to compare the results across both models. 

8 Please note that E (PD™(X)) = E (PDp R+ (S)) = PD;. 
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Table 2 Rounded 
log-likelihood values for 
elliptical copulas and GHC 


Copula 

GC 

TC 

sym. GHC 

GHC 

Log-likelihood 

634 

728 

8,848 

13,566 


4.3.1 Elliptical and Generalized Hyperbolic Copulas 

The parameters of the GC and the TC (as representatives of the elliptical copula class) 
are estimated via maximum likelihood using the R-Package “copulas” from Hofert 
et al. [29]. For the TC, we estimated 3.786 degrees of freedom indicating that a joint 
exceedance of high quantiles is more likely compared to the GC. Generalizing the 
TC, we also considered symmetric and asymmetric 9 GHC. For parameter estimation 
the R-package “ghyp” from Luethi and Breymann [30] was used. Please note that 
compared to the TC, the sym. GHC poses two more parameters due to the generalized 
inverse Gaussian distribution, which is used as mixing distribution for the family of 
generalized hyperbolic distributions and by another ten parameters because of the 
skewness vector in case of the asymmetric GHC. The corresponding log-likelihood 
values are summarized in Table 2. A standard likelihood ratio test indicates that the 
TC fits the data significantly better than the Gaussia one on every typical significance 
level. Also, the increase of the log-likelihood of the asymmetric GHC is significant 
to that of its symmetric counterpart. Hence, the stronger dependence between higher 
PDs, occurring in the asym. GHC, is significant again on every common level. 

Please note that the application of the GHC in practice has several drawbacks. 
The estimation procedure, the MCECM (multi-cycle, expectation, conditional esti- 
mation) algorithm is much more difficult to implement and time consuming com- 
pared to estimation of GC or a TC. Furthermore, the simulation of random numbers 
is much more computationally intensive due to the quantile functions, which con- 
tain the modified Bessel function of the third kind, requiring methods for numerical 
integration. 

4.3.2 (Hierarchical) Archimedean Copulas 

Out of the Archimedean class, we estimated parameters for the Gumbel, Clayton, 
Joe, and Frank copula but only the copulas of Gumbel and Joe provided a reasonable 
fit to our data. Since our data represent default probabilities, the economic intuition 
would be that the dependence increases for higher values, i.e., in times of recession, 
as can be seen from the empirical data (see, Fig. 2). The Gumbel and Joe copulas 
exhibit a positive upper tail dependence, 10 while the lower ones are zero. Therefore, 
they are suitable to model this kind of asymmetric dependence. The Frank copula is 


9 For the symmetric GHC, we force the skewness parameter y e R K to be zero for all components 
(notation according to Luethi and Breymann [30]). 

10 The coefficients of upper (lower) tail dependence are defined by Xu := 
lim „ /\ P [x 2 > F^\u) | Xi > Fj -1 (k)J and X L := lim„ x0 P [x 2 < F^\u) \ Xi < Ff l («)], 
respectively. 
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R-Vine (first level) HAC Sectors 



Fig. 1 First level of R-vine (with parameters of Gumbel and Joe copulas) and Hierarchical 
Archimedean copula (Gumbel) estimated from default data 


tail independent, whereas the Clayton copula posses only a lower tail dependence. 
Applying goodness-of-fit tests (see Genest et al. [31]), we have to reject both copulas 
(Frank and Clayton) on a significance level considerably below 1 %. In addition, we 
also considered hierarchical Archimedean constructions. With the help of the “HAC” 
package from Okhrin and Ristig [20], a stepwise ML estimation procedure was used 
to estimate the tree of the Gumbel HAC, depicted in Fig. 1 . The figure shows that the 
dependence parameters are in a range of 4.35 at the bottom, indicating the strongest 
dependence, and 1.21 at the top of the tree. For the ordinary Gumbel copula, we 
estimate a parameter value of 1.836, which is in the range of the HAC parameters. 
Since the variates selection on each level of the HAC tree is based on empirical values 
of Kendall’s r, the structures of the two HACs (Gumbel and Joe) coincide. 


4.3.3 Pair Copula Construction (PCC) 

In general, a PCC arises from a nonunique decomposition of a multivariate distrib- 
ution into a product of conditional bivariate distribution, characterized by so-called 
vines. The estimation algorithm of a PCC in general consists of three major steps: 

(I) Specification of a valid vine structure (e.g., C-, D-, or R-Vine tree), 

(II) type- selection of the underlying bivariate copulas for the tree in (I) (e.g., GC or 
Gumbel copula), 

(III) parameter estimation for the copulas, selected in (II). 

Brechmann and Schepsmeier [32] describe several algorithms addressing all these 
issues. In particular, the specification of the vine structure is done with the help of 
maximum spanning trees, where on each level a tree is selected such that the sum of 
Kendall’s r for all pairs of variables is maximized. To determine a particular copula 
for the selected pairs out of a set of certain candidates, the AIC criterion is applied. 


Copula-Specific Credit Portfolio Modeling 


139 


Finally, the copula parameters are estimated via ML. The corresponding steps 
(I)— (III) are implemented in the R-package “VineCopula” (see Schepsmeier et al. 
[21]), which has been used to determine a PCC for our data set. In order to allow 
maximum flexibility, we decided to use a R-vine, which generalizes both C- and D- 
vines. The candidate set for the pair copulas comprises GC, TC, Gumbel, Clayton, 
Frank, and Joe copula. 

Analog to the HAC, the estimation algorithm of the PCC identifies sectors 3 
and 8 as those with the strongest dependence. Therefore, these sectors are coupled 
together on the first level of the R-vine, which means that their pairwise dependence 
is explicitly selected to follow a Gumbel copula with 0 = 4.35. In general, all except 
one bivariate copulas on the first level are estimated to be Gumbel with parameter 
values in [1.56, 4.35], which is close to the HAC parameter range, see Fig. 1. Only 
in case of sectors 5 and 9, the Joe copula with parameter 1.87 is preferred. Again, 
the weakest dependence (measured by the implied value of Kendall’s r) on the first 
level is related to sector 5. On higher levels, all copulas out of the candidates set are 
selected to model conditional bivariate dependencies. 


4.3.4 Parametrization of the CreditRisk + - CBV Copula 

For the CBV model, the likelihood function is rather complex and a ML estimation is 
numerically not feasible. Hence, the parameters of the CBV factor copula are chosen 
such that the Euclidean distance between the empirical and the theoretical covariance 
matrix is minimal (see, e.g., Fischer and Dietz [23]). 


4.3.5 Illustration for Sectors 3 and 8 

Exemplarily, Fig. 2 illustrates the contour plot of the estimated copula density 
between sectors 3 (cycl. consumer goods) and 8 (industry) for different competi- 
tors as well as the (transformed) empirical observations. Notice that darker areas 
indicate higher concentration of the probability mass. In the first row, the elliptical 
and GHCs are displayed. Looking at the center of the unit squares, one observes that, 
in case of the TC and the asymmetric GHC, more probability mass is concentrated 
around the main diagonal as for the GC or the symmetric GHC. Since the asymmetric 
GHC provides a significantly better fit compared to the TC, the issue of asymmetri- 
cally distributed data seems to be more important than the absence of a positive tail 
dependence, at least for our data. This might be caused by the limited sample size 
of only 120 observations. Although the asymmetric GHC has a significantly better 
fit compared to the symmetric one and the skewness parameters are strictly positive, 
its density still looks very symmetric. 

In contrast, the copula of the CBV model 11 is extremely concentrated around 
the main diagonal. Here, observations aside from the diagonal have a very low 


1 1 In case of the CBV copula, the density is estimated via a two dimensional kernel density estimator. 
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Fig. 2 Contour of the estimated copulas between sector 3 (cycl. consumer goods) and 8 (industry) 
together with empirical observations 


probability. Please note again that the estimation procedure for this copula is dif- 
ferent, which might explain this issue to some extend. For the ordinary Gumbel and 
Joe copulas, one has to choose one single parameter for all bivariate (and higher 
dimensional) dependencies. Therefore the estimation is always a trade-off between 
stronger and weaker dependencies. This leads to the effect that, in our example, the 
dependence in both cases seems to be rather underestimated by this copulas compared 
to its competitors. The HAC overcomes this drawback by using different parameters, 
which leads to a significantly better fit. 


4.4 Effect of the Copula on the Risk Figures and the Tail 
of the Loss Distribution 

Finally, we analyze the impact of the sector copula on the right tail and therefore 
on the economic capital. Since, in practice, the underlying data sets used for para- 
metrizations of both model types are rather different and not comparable, we do 
not draw any comparisons between the absolute values of the risk figures across the 
two models. Instead, we measure the impact with the help of factors, where the risk 
figures of the models with the GC are normalized to one. In case of the CreditRisk + - 
CB V model, the marginal distributions of the sectors, which follow a weighted sum 
of Gamma distributions (see Eq. (1)), are replaced by Gamma distributed variates 
with the same mean and variance, for reasons of simplicity. Since this is a monotone 
transformation, the dependence structure is not affected. Please note that by drawing 
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the sector realizations 12 for the CreditMetrics model, we use the survival copula , 13 
because in this case higher values of the sector variates correspond to an increase 
rather than a decrease of obligors creditworthiness. 

Table 3 summarizes all risk figures. The copulas are ordered according to the 
impact on the economic capital on a 99.9 % level in case of the CreditRisk + model. 

First of all, one observes that in the CreditRisk + framework, the risk decreases 
if we switch from the original model (CBV) to another one. In both models, the 
GC implies the lowest risk, followed by the sym. GHC. Although both copulas 
are elliptically symmetric and tail independent, the risk figures differ by up to 4%. 
Applying a TC, the risk rises in both models because of the positive tail dependence of 
Xjj = 0.69. For the CreditMetrics model the markup is around 6 %. The highest risk 
arises if we use an asymmetric dependence structure, i.e., a (hierarchical) Gumbel 
or Joe copula, an asym. GHC, a PCC or, in case of CreditRisk + , the factor copula 
induced by the CBV model. Therefore, at least for our data set and portfolio, there 
is an indication that the risk arising from an asymmetric dependence structure, i.e., 
where dependencies are higher during times of a recession, is higher compared to 
the risk caused by a positive tail dependence. In the CreditRisk + model even the 
economic capital in case of the HAC (Joe) copula is around 8.1 % above the amount 
of the model with a GC and 2 % below the basic model. In both models, the risk 
implied by a Joe copula is higher compared to a Gumbel copula. Since both copulas 
exhibit no positive lower tail dependence, whereas the upper tail dependence 14 is 
higher in case of the Joe copula, this observation is plausible. 

As to be expected, all portfolio loss distributions exhibit a significant amount of 
skewness (skew) and kurtosis (kurt), measured by the third and fourth standardized 
moments, respectively. In addition, we calculated the right-quantile weight (RQW) 
for p = 0.875 which was recommended by Brys et al. [34] as a robust measure of 
tail weight and is defined as follows: 


RQW (/3) := 


FI 1 (jg) + Fl_ 0 ~ f ) ~ 2 F l~* (0-75) 

F l' (¥) - f l X (l - f) 


where, in our case, F 1 denotes the quantile function of the portfolio loss distribution. 
First of all, it becomes obvious that the rank order observed for EC 99.9 with respect 
to the copula model is highly correlated to the rank order of the higher moments and 
of the tail weight. Secondly, all of the latter statistics derived from the CreditMet- 
rics framework are (significantly) higher than those derived from the CreditRisk + 
framework. 


12 For details on the simulation of copulas in general, please refer to Mai and Scherer [33]. 

13 For a random vector u = (u\, . . . , uk) T with copula C, the survival copula is defined as the 
copula of the vector (1 — u\, . . . , 1 — uk) T . 

14 The coefficients of upper tail dependence implied by the estimated parameters are 0.54 in case 
of the Gumbel copula and 0.66 for the Joe copula. 
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CreditRisk + CreditMetrics 



toss percertito (standard model} loss percent! !e {sta ndard model) 


Fig. 3 Right tail of portfolio loss distribution for selected copulas 


Finally, Fig. 3 exhibits the estimated densities of the portfolio loss for both models 
and different copulas. On the horizontal axis, the percentiles of the loss distribution of 
the particular standard models are displayed. The ordering of the densities confirms 
our results, derived from the corresponding risk figures. 


5 Summary 

Credit portfolio models are commonly used to estimate the future loss distribution 
of credit portfolios in order to derive the amount of economic capital which has to 
be allocated to cover unexpected losses. Therefore, capturing the (unknown) depen- 
dence between the counterparties of the portfolios or between the economic sectors 
to which counterparties have been assigned is a crucial issue. For this purpose, copula 
functions provide a flexible toolbox to specify different dependence structures. 

Against this background, we analyzed the effect of different parametric copulas on 
the tail of the loss distribution and the risk figures for a hypothetical portfolio and for 
both CreditMetrics and CreditRisk + , two of the most popular credit portfolio mod- 
els in the financial industry. Our results indicate that the specific CreditRisk + — CBV 
model uses a rather conservative copula. However, referring to Jakob and Fischer 
[8], one might come across to certain artifacts for this (implicit) copula family. In the 
CreditMetrics setting, the canonical assumption of a Gaussian copula allows an easy 
and fast implementation but also gives rise to certain drawbacks, such as the absence 
of a tail dependence (“extreme events occur together”) or the ability to model asym- 
metric dependence structures for which we found evidence in the underlying data 
set. Replacing the Gaussian copula by alternative competitors (Student-t, General- 
ized hyperbolic, PCC or generalized Archimedean copulas) we could significantly 
improve the goodness-of-fit to the underlying PD series. As a consequence, using the 
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Gaussian copula might lead to an underestimation of credit risk by up to 10 % (for 
EC 99 . 9 ) within the CreditMetrics framework, at least for our calibration. In contrast, 
the CreditRisk+ model seems to be less sensitive with respect to the dependence 
structure, because here the markup (related to the Gaussian copula as benchmark) is 
around 2-4 % points lower. The question about the different behavior of both model 
types has to be left open for further research. 
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Implied Recovery Rates — Auctions 
and Models 


Stephan Hocht, Matthias Kunze and Matthias Scherer 


Abstract Credit spreads provide information about implied default probabilities 
and recovery rates. Trying to extract both parameters simultaneously from market 
data is challenging due to identifiability issues. We review existing default models 
with stochastic recovery rates and try calibrating them to observed credit spreads. 
We discuss the mechanisms of credit auctions and compare implied recoveries with 
realized auction results in the example of Allied Irish Banks (AIB). 


1 Introduction 

Corporate credit spreads contain the market’s perception about (at least) two sources 
of risk: the time of default and the subsequent loss given default, respectively, the 
recovery rate. Default probabilities and recovery rates are unknown parameters — 
comparable to the volatility in the Black-Scholes model. We concern the question 
whether it is possible to reverse-engineer and disentangle observed credit spreads 
into these ingredients. Such a reverse-engineering approach translates market values 
into model parameters, comparable to the extraction of market implied volatilities 
in the Black-Scholes framework. There is growing literature in the field of implied 
default probabilities, whereas scientific studies on implied recoveries are sparse. 
Inferring implied default probabilities from market quotes of credit instruments often 
relies on the assumption of a fixed recovery rate of, say, 0 = 40 %. Subsequently, 
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default probabilities are chosen such that model implied credit spreads match quoted 
credit spreads. The assumption of fixing 0 = 40 % is close to the market- wide 
empirical mean (compare Altman et al. [1]), but disregards recovery risk. In many 
papers, the same recovery rate is assumed for all considered companies, although 
empirical studies suggest that recoveries are time varying (compare Altman et al. 
[2], Bruche and Gonzalez- Aguado [3]), depend on the specific debt instrument, and 
vary across industry sectors (compare Altman et al. [1]). Obviously, the resulting 
implied default probability distribution strongly depends on the assumptions on the 
recovery rate. Since default probabilities and recoveries both enter theoretical spread 
formulas, we face a so-called identification problem. Making this more plastic, the 
widely known approximation via the “credit triangle” (see, e.g., Spiegeleer et al. 
[4, pp. 256]) suggests: 


spread s = (1 — <P)X , (1) 

where 0 is the recovery rate and A denotes the default intensity. Obviously, for any 
given market spread s, the implied recovery is a function of (the assumption on) A 
and vice versa. Using this simplified spread formula alone, it is clearly impossible to 
reverse-engineer 0 and A simultaneously from s. As we will see, this identification 
problem also appears in more sophisticated credit models. 

We invoke and (at least partially) answer the questions: 

• Is it possible to simultaneously extract implied recovery rates and implied default 
probabilities (under the risk-neutral measure Q)? 

• How do implied recoveries compare to realized recoveries? 1 

We address the first question using two types of credit models, where neither the 
recovery rate nor the default probability distribution is fixed beforehand. As opposed 
to most existing approaches for the calculation of implied recoveries, both procedures 
only take into account prices from simultaneously traded assets. Instead of analyzing 
the spread of one credit instrument for different points in time, implied recoveries 
and default probabilities are extracted from the term structure of credit spreads. 
Likewise to the aforementioned implied volatility calculation, this restriction allows 
for an implied recovery calibration under the risk- neutral measure Q. Analyzing the 
second question, both models are exemplarily calibrated to market data of Allied 
Irish Banks (AIB), who experienced a credit event in June 2011. Subsequently, real 
recovery rates were revealed and can thus be compared to their implied counterparts. 
In order to clarify how real recoveries are settled in today’s credit markets, we start 
by introducing the mechanism of credit auctions. 


1 Here, the term realized recovery does not refer to workout recoveries but to a credit auction result. 
The question whether the auction procedure appropriately anticipates workout recoveries is left for 
future research. 
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2 CDS Settlement: Credit Auction 


CDS are the most common and liquidly traded single-name credit derivatives — their 
liquidity usually even exceeds the one of the underlying bond market. In case of a 
credit event, the protection buyer receives a default payment, which approximates 
the percentage loss of a bond holder subject to this default 2 (see Schonbucher [5, 
preface]). This payment is referred to as loss given default (LGD). The corresponding 
recovery is defined as one minus the LGD. Recoveries are often quoted as rates, e.g., 
referring to the fraction of par the protection buyer receives, after the CDS is settled. 
There are mainly three types of credit events that can be distinguished: 

• Bankruptcy A bankruptcy event occurs if the company in question faces insol- 
vency or bankruptcy proceedings, is dissolved (other than merger, etc.), liquidated, 
or wound up. 

• Failure- to-pay This occurs if the company is unable to pay back outstanding 
obligations in an amount at least as large as a prespecified payment requirement. 

• Restructuring A restructuring event takes place if any clause in the company’s 
outstanding debt is negatively altered or violated, such that it is legally binding 
for all debt holders. Not all types of CDS provide protection against restructuring 
events. 

These credit events are standardized by the International Swaps and Derivatives 
Association (ISDA). The legally binding answer to the question, whether or not a 
specific credit event occurred, is given by the so-called Determinations Commit- 
tees (DC). 3 CDS ISDA standard contracts as well as the responsible DCs differ 
among geopolitical regions. As opposed to standard European contracts, the stan- 
dard North American contract does not provide protection against restructuring credit 
events. The differences are originated by regulatory requirements and the absence 
of a Chapter 1 1 equivalent: in order to provide capital relief from a balance sheet 
perspective, European contracts have to incorporate restructuring events. Our focus 
will be on the case of nonrestructuring credit events in what follows. 

Prior to 2005, CDS were settled physically, i.e., the protection buyer received the 
contractually agreed notional in exchange for defaulted bonds with the same notional. 
Accordingly, the corresponding CDS recovery rate was the ratio of the bond’s market 
value to its par. This procedure exhibited different shortfalls (see Haworth [6, p. 24] 
or Creditex and Markit [7]): 

• For a protection buyer, it was necessary to own the defaulted asset. Often, this 
entailed an unnatural inflation of bond prices after default and became a substantial 


2 We will use “credit event” and “default” as synonyms. Note, however, that the terms default 
and credit event are sometimes distinguished in the sense that default is associated with the final 
liquidation procedure. 

3 More information on DCs and ISDA can be found on www.isda.org. 
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problem in default events, where the notional of outstanding CDS contracts 
exceeded the par of available bonds by multiples. 4 

• On the contrary, the protection seller was obliged to own the defaulted asset after 
settlement of the CDS. Thus, she or he mandatorily retained a long position with 
respect to the reference entity’s credit risk, making it less attractive to sell protec- 
tion. 

• Since different bonds generally may have different prices, there was no unique set- 
tlement price and two identical CDS contracts often were settled against different 
recoveries, depending on the liquidity of the associated bond market. 

These shortfalls were the initial motivation to alter the standard settlement proce- 
dure by introducing an auction-based method. From 2005 to 2013 auctions for the 
settlement of CDS and LCDS (Loan Credit Default Swaps) contracts for 1 12 default 
events were held (see Creditex and Markit [8]). On an annual basis, the number of 
auctions clearly peaked after the financial crisis, i.e., in 2009, where auctions for 45 
default events took place. The recovery of a standard CDS contract, traded today, 
thus usually refers to the result of an auction, which is held subsequent to a credit 
event. 

The auction mechanism aims at a unique and fair settlement price (recovery). 
It can be split into two stages: the initial bidding period and a subsequent one- 
sided Dutch auction. The whole process is administrated by Creditex and Markit. 
In the initial bidding period, each participant, i.e., each protection seller or buyer, 
represented by one of the bigger investment banks as their dealer, submits a two-way 
quote. This quote consists of a bid and an offer price for the cheapest-to-deliver bond 
of the reference entity together with a one-way physical settlement request. In the 
one-sided Dutch auction, the unique recovery for all outstanding CDS is assessed as 
the “fair” value of the cheapest-to-deliver bond with respect to its par. 5 Before the 
auction starts, a quotation amount, a maximum bid-offer spread, and the cap amount 
is published by ISDA. These three quantities will be explained, while passing through 
the auction. 


2.1 Initial Biding Period 

All participants submit a two-way quote together with a one-way physical 
settlement request. That quote refers to the price of the cheapest bond which is listed 
as deliverable obligation by ISDA. The request must be in the same direction as the 
net CDS position, e.g., participants that have net sold protection are not allowed to 
request delivery of an obligation. Furthermore, the two-way quote must not violate 
the maximum bid-offer spread. In case a dealer does not represent any outstand- 
ing CDS positions with respect to the defaulted entity, she or he is not admitted to 


4 Sometimes the phenomenon that some bonds were used several times for the settlement of CDS 
is referred to as “recycling.” 

5 Restructuring events differ, since they allow for maturity specific cheapest-to-deliver bonds. 
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participate in the auction. Moreover, the notional of the physical settlement request 
is not allowed to exceed the notional of the outstanding position. 

In the next step, the so-called inside market midpoint (IMM) is calculated subject 
to the following method: 

1. Crossing quotes are canceled, i.e., in case an offer quote is smaller or equal to 
another bid quote, the specific bid and offer are both eliminated. 6 

2. The so-called best halves of the remaining quotes are constructed. The best bid 
half refers to the (rounded up) upper half of the remaining bid quotes. Accord- 
ingly, the best offer half contains the same number of lowest non-canceled offer 
quotes. 

3. The IMM is defined as the average of all quotes in those best halves. 

Any participant, whose bid and ask price are both violating the IMM has to pay 
an adjustment amount. 7 This penalty is supposed to assure that the IMM reflects 
the underlying bond market in an appropriate way. 8 The initial bidding period is 
concluded by calculating the net open interest, i.e., the netted notional of physical 
settlement requests, which is simply carried out by aggregation. In case this amount 
is zero, the IMM is fixed as the auction result and consequently as the recovery for 
all CDS, which were supposed to be settled via the auction. Otherwise, the IMM 
serves as a benchmark for the second part of the auction procedure. 

To illustrate this first step, we consider the failure-to-pay event of AIB on June 
21, 2011. Two auctions were held, one for senior and one for subordinated CDS 
referring to AIB. We only consider the senior auction. Table 1 displays the submitted 
two-way quotes from all 14 participants. For the calculation of the IMM, the reported 
bid quotes are arranged in descending order, whereas the offers start from the lowest 
quote. 

The first quotes from Nomura (bid) and Citigroup (offer) are canceled out, since 
the corresponding bid exceeds the offer. Note that this cancelation does not entail 
a settlement, both quotes are merely neglected with regard to the IMM calculation. 
Therefore, 1 3 bid and offer quotes remain and the best halves are the seven highest bid 
and lowest offer quotes, which are emphasized in Table 1. The IMM is calculated via 
averaging over these quotes and rounding to one eighth, yielding an IMM of 71 .375. 
The maximum bid-offer spread was 2.50 % -points and the quotation amount was 
EUR 2 MM. In Table 2, the corresponding physical settlement requests are reported. 

As the aggregated notional from bid quotes exceeds the aggregated notional from 
offer quotes, the auction type is “to buy”. Since there is netted demand for the 
cheapest-to-deliver senior bond, initial offers falling below the IMM are considered 


6 Note that they are not settled, but only not taken into account for the calculation of the IMM. 

7 The term “violating” refers to both quotes falling below the IMM (auction is “to buy”) or exceeding 
the IMM (auction is “to sell”), respectively. 

8 Suppose the net open interest is “to sell”, i.e., there is a surplus on the seller side. If a participant 
submits a bid exceeding the IMM, he or she is considered off-market, since prices are supposed to go 
down and not up. Then the corresponding participant has to pay the prefixed quotation amount times 
the difference between the IMM and his or her bid. The penalty works vice versa for off-market 
offers if the open interest is “to buy”. 
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Table 1 Dealer inside market quotes for the first stage of the auction of senior AIB CDS (see 
Creditex and Markit [8]). Published with the kind permission of ?Creditex Group Inc. and Markit 
Group Limited 2013. All rights reserved 


Dealer 

Bid 

Offer 

Dealer 

Nomura Int. PLC 

72.00 

70.50 

Citigroup Global Markets Ltd. 

Goldman Sachs Int. 

71.00 

71.50 

Societe Generale 

Bank of America N.A. 

70.50 

72.00 

Credit Suisse Int. 

Barclays Bank PLC 

70.50 

72.00 

Deutsche Bank AG 

BNP Paribas 

70.50 

72.00 

JPMorgan Chase Bank N.A. 

HSBC Bank PLC 

70.50 

72.25 

Morgan Stanley &Co. Int. PLC 

The Royal Bank of Scotland PLC 

70.50 

72.50 

UBS AG 

Deutsche Bank AG 

70.00 

73.00 

Bank of America N.A. 

UBS AG 

70.00 

73.00 

Barclays Bank PLC 

Morgan Stanley &Co. Int. PLC 

69.75 

73.00 

BNP Paribas 

Credit Suisse Int. 

69.50 

73.00 

HSBC Bank PLC 

JPMorgan Chase Bank N.A. 

69.50 

73.00 

The Royal Bank of Scotland PLC 

Societe Generale 

69.00 

73.50 

Goldman Sachs Int. 

Citigroup Global Markets Ltd. 

68.00 

74.50 

Nomura Int. PLC 

Resulting IMM 

71.375 



All quotes are reported in % 


Table 2 Physical settlement requests for the first stage of the auction of AIB (see Creditex and 
Markit [8]). Published with the kind permission of ?Creditex Group Inc. and Markit Group Limited 
2013. All rights reserved 


Dealer 

Type 

Size in EUR MM 

BNP Paribas 

Offer 

48.00 

Credit Suisse Int. 

Offer 

43.90 

Morgan Stanley &Co. Int. PLC 

Offer 

11.80 

Barclays Bank PLC 

Bid 

30.00 

JPMorgan Chase Bank N.A. 

Bid 

52.00 

Nomura Int. PLC 

Bid 

7.75 

UBS AG 

Bid 

16,00 

Total (net) 

“To buy” 

2.05 


off-market and the corresponding dealers have to pay an adjustment amount. In 
Table 1, only Citigroup ’s offer of 70.50 is considered off-market. The difference to 
the IMM is 0.875. Using the quotation amount as notional, the resulting adjustment 
amount is EUR 17, 500. The second part of the auction aims at satisfying the net 
physical settlement request of EUR 2.05 MM demand. 
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2.2 Dutch Auction 


This second step is designed as a one-sided Dutch auction, i.e., only quotes in the 
opposite direction of the net open interest are allowed. In case the net open interest is 
“to sell”, dealers are only allowed to submit bid limit orders and vice versa. For the 
senior CDS auction of AIB, the net physical settlement request is “to buy” and thus 
only offer limit orders are allowed. As opposed to the first stage of the auction, there 
is no restriction with respect to the size of the submitted orders, regardless of the 
initial settlement request. In order to prevent manipulations, particularly in case of a 
low net open interest, the prefixed cap amount, which is usually half of the maximum 
bid-offer spread, imposes a further restriction on the possible limit orders. In case the 
auction is “to sell”, orders are bounded from above by the IMM plus the cap amount 
and vice versa if the net open interest is “to buy”. 

In addition to these new limit orders, the appropriate side from the initial two-way 
quotes from the first stage of the auction are carried over to the second stage — as 
long as the order does not violate the IMM. All quotes, which are carried over, are 
determined to have the same size, i.e., the prespecified quotation amount, which was 
already used to assess the adjustment amount. 

Now, all submitted and carried over limit orders are filled, until the net open 
interest is matched. In case the auction is “to sell”, i.e., there is a surplus of bond 
offerings, the bid limit orders are processed in descending order, starting from the 
highest quote. Analogously, if the auction is “to buy”, offer quotes are filled, starting 
from the lowest quote. The unique auction price corresponds to the last quote which 
was at least partially filled. Furthermore, the result may not exceed 100 %. 9 

Reconsider the credit event auction for outstanding senior AIB CDS. Both, carried 
over offer quotes (first) as well as offers from the second stage (second) of the auction 
are reported in Table 3. 

Recalling that the net physical settlement request was EUR 2.05 MM, we observe 
that the first two orders were partially filled. The associated limit orders were 
70.125 %, which is consequently fixed as the final auction result, i.e., all outstanding 
senior CDS for AIB were settled subject to a recovery rate of 70.125 %. Following 
an auction, all protection buyers, who decided to settle their contracts physically 
beforehand, are obliged to deliver one of the deliverable obligations in exchange for 
par. Naturally, they are interested in choosing the cheapest among all possible deliv- 
erables. Thus, in case of a default, protection buyers are long a cheapest- to-delivery 
option (compare, e.g., Schonbucher [5, p. 36]), enhancing the position of a protection 
buyer. Details about the value of that option can be found in Haworth [6, pp. 30-32] 
and Jankowitsch et al. [9]. 


9 For Northern Rock Asset Management, the European DC resolved that a restructuring credit event 
occurred on December, 15, 2011. Two auctions took place on February, 2, 2012 and the first one 
theoretically would have led to an auction result of 104.25 %. Consequently, the recovery was fixed 
at 100 % (compare Creditex and Markit [8]). 
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Table 3 Limit orders for the senior auction of AIB (see Creditex and Markit [8]). Published with 
the kind permission of ?Creditex Group Inc. and Markit Group Limited 2013. All rights reserved 


Dealer 

Type 

Quote (%) 

Size (EUR MM) 

Aggregated size (USD MM) 

JPMorgan Chase Bank N.A. 

Second 

70.125 

2.05 

2.05 

Barclays Bank PLC 

Second 

70.125 

2.05 

4.10 

Credit Suisse Int. 

Second 

70.25 

2.05 

6.15 

BNP Paribas 

Second 

70.25 

1.00 

7.15 

BNP Paribas 

Second 

70.375 

1.05 

8.20 

Citigroup Global Markets Ltd. 

First 

71.375 

2.00 

10.20 



Nomura Int. PLC 

Second 

75 

2.00 

42.25 


2.3 Summary of the Auction Procedure 

The auction-based settlement of CDS is designed to approximate the loss of the 
cheapest-to-deliver bond. The term “CDS auction” might thus be misleading, since 
it is an auction, where the market value of the cheapest from a set of bonds is assessed. 
Consequently, the recovery rate of a CDS contract is the market value of this bond 
divided by its par. 

In the above example, JPMorgan’s and Barclays’ orders were the only ones filled. 
Both dealers had a considerable physical settlement request of EUR 52 MM and 
EUR 30 MM, respectively, possibly reflecting a long CDS position. By submitting 
the lowest possible quote for a notional of EUR 2.05 MM each, both dealers stretched 
the recovery to the possible maximum. In case, both parties indeed represented large 
long CDS positions, they profited from the low open interest. Moreover, the final 
auction result was below the IMM. Thus, if one dealer would have quoted the final 
auction result already in the first step, she or he would have been considered off- 
market and consequently penalized. 

Another problem appeared during a restructuring credit event of SNS bank, where 
senior and subordinated CDS were settled in the same auction. Due to government 
intervention, subordinated bond holders experienced a full write-down (“bail-in”) 
before the auction. Thus, there were no more subordinated deliverables and senior 
and subordinated CDS had the very same recovery (either 95.5 or 85.5 %, depending 
on the maturity of the CDS), contradicting the connection between the subordinated 
bond holder’s loss and the subordinated CDS recovery. Another case for a coun- 
terintuitive auction result concerned the settlement of CDS referring to Fannie Mae 
or Freddy Mac, where subordinated contracts recovered above senior. Moreover, as 
the determination committees and dealers are big investment banks, there might be 
conflicts of interest when determining whether a credit event occurred or not. 

These are reasons for an ongoing discussion about whether this one-sided auction 
design is fair or not (compare Du and Zhu [10] for the proposal of an alternative 
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auction design). Currently, ISDA is working on a further supplement to the credit 
derivative definitions, involving among others the introduction of a new credit event 
as a solution to what happened with subordinated SNS CDS. 


3 Examples of Implied Recovery Models 

As explained above, the recovery of a CDS, @ T e [0, 1], refers to the result of an 
auction which is held after a credit event at time t and is designed to approximate 
the relative “left-over” for a bond holder. Before a default event and the following 
auction takes place this recovery is unknown. One way to assess this quantity for 
nondefaulted securities is to reverse-engineer implied recoveries from market CDS 
quotes. Any basic pricing approach for the “fair” spread st of a CDS with maturity 
T > 0 is of the form 


I.e., the spread is the risk-neutral expectation of a function of the default time (or 
default probability, respectively) and the recovery rate in case of default. Specifying t 
and 0 T , two models are revisited and calibrated by minimizing the root mean squared 
error (RMSE) between Eq[/(t, <P t )] and market spreads over a term structure of 
CDS spreads. 


3.1 Cox-Ingersoll-Ross Type Reduced-Form Model 

This reduced-form model resembles the one presented in Jaskowski and McAleer 
[11], although applied in a different context. All reduced-form models are based 
on the same principle. The time of a credit event t is the first jump of a stochastic 
counting process Z = {Z r } r > 0 e No, i.e., r = inf{r > 0 : Z t > 0}. In this case Z 
will be a Cox-Process governed by a Cox-Ingersoll-Ross type intensity process X, 
i.e., 


The recovery in this model is defined as an exponential function of the intensity 
process, i.e., 


s T = Eq[/(tt, 0 t )]. 


( 2 ) 


dX t = k( 6 — X t )dt + o^/XtdWt, Aq > 0. 



where a e [0, 1] is referred to as the recovery parameter. A default in a period 
of high expected distress, e.g., in an economic downturn, entails lower recoveries 
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Weeks to AIB’s credit event auction 

Fig. 1 Weekly average spreads for AIB senior and subordinated CDS with 1 and 5 years maturity. 
The spreads represent two whole term structures, which are used to calibrate the presented implied 
recovery approaches in every displayed week independently 


and vice versa. Comparable choices for modeling recoveries can be found, e.g., in 
Madan et al. [12], Das and Hanouna [13], Hocht and Zagst [14], or Jaskowski and 
McAleer [11]. Since the model will be calibrated to one CDS spread curve, one has 
to be restrictive concerning the amount of free model parameters in the recovery 
model. Using this model, the risk-neutral spread st(k, 0, a, Xo, a) has an integral- 
free representation. The resulting risk-neutral parameters and subsequently the risk- 
neutral implied recovery and probability of default are determined by minimizing 
the RMSE: 


(/c*,6>* 


a*) 


argmin / 4 X! ( S T ~ St( - k ’ d ’ *0, a)) 2 , 


(3) 


rel 


where I is the set of maturities with observable market quotes for CDS spreads s^. 
In case senior as well as subordinated CDS are available for a certain defaultable 
entity, two different recovery parameters a sen and a su b are used, while the intensity 
parameters are the same for both seniorities. This reflects the fact that in case of a 
credit event both CDS types are settled, although usually in different auctions. 10 In 
this case, the optimization in Eq. (3) is simply carried out by matching senior and 
subordinated spreads simultaneously. For the calibration, we reconsider the exam- 
ple of AIB. Figure 1 exemplarily shows weekly average quotes for AIB senior and 
subordinated CDS spreads with maturities 1 and 5 years. 

Approaching the time of default, a spread widening and inversion of both senior 
and subordinated term structures can be observed. Calibrating the introduced Cox- 
Ingersoll-Ross model to AIB CDS quotes for each week independently for several 
maturities leads to the resulting implied recoveries and 5 -year default probabilities 
shown in Fig. 2. 


10 In the current version of the upcoming ISDA supplement, subordinated CDS may also settle 
without effecting senior CDS. However, so far either both or none settles. 
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Weeks to AIB’s credit event auction 

Fig. 2 Weekly calibration results for the CIR model applied to CDS spreads of AIB before its 
default in June 2011 


Implied senior and subordinated recoveries and implied default probabilities vary 
substantially over time. One reason is that term structure shapes and general spread 
regimes also vary unusually strong from week to week, since AIB is in distress. 
Furthermore, there are co-movements of the 5 -year implied default probability and 
the implied recoveries. This is caused by the fact that a (recovery) and 0 (long-term 
default intensity) have a similar effect on long term CDS spreads. Assuming X t = 6 
for all t > t* > 0, the fair long term spread can be approximated via 

st ~ co + (1 — aeT e )6, for all T > t*, (4) 

where co e M is constant. Hence, using the above approximation for a given spread 
st, the optimal recovery parameter < 2 * can be seen as a function of the long term 
default intensity, denoted as a* (6). This entails the existence of a continuum of 
parameter values (tc* , 0 , cr* , Xq, a* (6)), 6 > 0, which all generate a comparable 
long term spread and thus a similar RMSE. Consequently, a minor variation in the 
quoted spreads might cause a substantial change in the resulting optimal parameters 
and thus in the implied recovery and implied probability of default. This is referred 
to as identification problem. 

The following section contains a framework to circumvent this identification 
problem. 


3.2 Pure Recovery Model 

Two CDS contracts with the same reference entity and maturity, but differently ranked 
reference obligations, face the same default probabilities, but different recoveries. 
The general idea of the “pure recovery model” goes back to Unal et al. [15] and 
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Schlafer and Uhrig-Homburg [16]. The approach makes use of this fact by con- 
sidering the fraction of two differently ranked CDS spreads, which is then free of 
default probabilities. Hence, spread ratios are considered and modeled and default 
probabilities can be neglected. A comparable approach is outlined in Doshi [17]. Let 
s sen an( j ^.sub denote the fair spreads of two CDS contracts referring to senior and 
subordinated debt. The basic idea can be illustrated using the credit triangle formula 
from Eq. (1), i.e., 


c sen (I _ 0sen^ t - 0 sen 

« — = (5) 

^sub — 0 sub )X 1 — 0 sub ’ 

Under simplified assumptions the ratio of two different types of CDS spreads is a 
function of the recoveries 0 sen and 0 sub . In case of the credit triangle formula, for 
instance, the underlying assumptions include independence of X and 0. The crucial 
point is to find a suitable and sophisticated model, such that this fraction again only 
contains recovery information. Implied recoveries are then extracted by calibrating 
fractions of senior and subordinated spreads. We propose a model that allows for 
time variation in 0 but no dependence on the default time t . 

In a first step, a company-wide recovery rate Xj is defined, i.e., a recovery for 
the whole company in case of a default until T, where T max is the maximum of all 

instruments’ maturities which should be captured by the model. Suppose /ro e (0, 1), 
li i e (—1, 1), and fjio + fi\ g (0, 1). Furthermore, let v e (0, 1). For a certain 
maturity r max > T > 0, Xj is assumed to be Beta-distributed with the following 
expectation and variance: 

Eq[Xt-] = n(T) := Ha o + Ml\/^/^max, (6) 

Var Q [X r ] = a 2 (T) := i - /x(T) 2 ]. (7) 

The Beta distribution is a popular choice for modeling stochastic recovery rates, 
since it allows for an U-shaped density on [0, 1] that is empirically confirmed for 
recovery rates. The above parameter restrictions assure that a Beta distribution with 
Eq[X^] and VarqdXr] as above actually exists. The square-root specification allows 
for a higher differentiation between maturity specific recoveries near T = 0, a 
phenomenon which is also widely reflected in CDS market term structures. Overall, 
this company- wide recovery distribution varies in time without depending on r. 
In a second step, the seniority specific recoveries 0^ n and are defined as 
functions of Xj. In legal terms, such a relation is established via a pecking order, 
defined by the Absolute Priority Rule (APR): In case of a default event, any class 
of debt with a lower priority than another will only be repaid if all higher ranked 
debt is repaid in full. Furthermore, all claimants of the same seniority will recover 
simultaneously, i.e., they receive the same proportion of their par value. Let J sec , 
J sen , and d su b denote the proportions of secured, senior unsecured, and subordinated 
unsecured debt, respectively, on the balance sheet of a company at default, such that 
^sub + d S en + ^sec = 1 • Figure 3 illustrates the APR. 
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Fig. 3 Absolute priority rule: seniority specific recoveries depend on the stochastic firm-wide 
recovery and the debt structure of the company 

The parameters d su b, d sen , and <7 sec determine, which proportion of Xj is assigned 
to senior and subordinated debt holders if a default occurs. Motivated by the linkage 
of bonds and CDS in the auction mechanism, <P p n and <£^ ub are also assumed to 
be the appropriate CDS recoveries. Note, however, that in practice, APR violations 
often occur and are widely examined (see, e.g., Betker [18] and Eberhart and Weiss 
[19]). Using the APR rule, a general spread representation as in Eq. (2) as well as 
independence of <P and r , the recoveries are deterministic functions of the company- 
wide recovery Xj and the fraction of senior to subordinated CDS spreads is given 
by 


where f PT , qT (jc) denotes the density of a Beta(/? 7 , q p \ -distributed random variable. 
The variables pj and qj are linked to the parameters /r o, /U, and v via Eqs.(6) 
and (7) and the first two moments of the Beta distribution. They are calibrated using 
the above formula, whereas the balance sheet parameters J sec , <i sen , and d su b are 
directly taken from quarterly reports. Instead of calibrating a single-spread curve, 
the calibration is carried out by matching theoretical fractions s^ n /sf^ipto, 
in Eq. (8) for a set of several maturities to their market counterparts 1 s'^ f,sen / k s , ^’ sub , 



( 8 ) 


i.e. 




160 


S. Hocht et al. 
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Fig. 4 Weekly calibration results for the pure recovery model applied to CDS spreads of AIB 
before its default in June 2011 


The resulting risk-neutral implied distribution of the company- wide recovery can be 
translated into risk-neutral seniority specific recovery distributions by applying the 
APR rule. Furthermore, we could proceed to use this implied recovery result and 
extract implied default probabilities in a second step. 

Calibrating the pure recovery model to senior and subordinated spreads from 
AIB (see Fig. 1) before its default yields implied recoveries for senior and sub debt, 
averaged over all maturities as displayed in Fig. 4. 

As opposed to the Cox-Ingersoll-Ross model, the resulting recoveries do not 
exhibit sudden jumps, but are more stable over time. Only during the last weeks 
before default (weeks 17 to 7), particularly the subordinated recovery fluctuates. 
However, this is related to the significant movements of the market spreads and not 
originated by an identification problem among the parameters. Moreover, both senior 
and subordinated recoveries are in line with the later auction results, at least with 
respect to their proportional relation. 


4 Conclusion and Outlook 

Extracting implied recoveries and implied default probabilities in a risk-neutral set- 
ting tends to generate instable parameter estimates. The identification problem among 
long-term default probabilities and recovery rates is not limited to the presented CIR 
model, but can also be observed, e.g., in jump-to-default equity models such as 
the one proposed in Das and Hanouna [13]. We illustrated one way to circumvent 
the problem by reducing the calibrated expression to a form, where only recovery- 
related parameters appear. This is possible by considering instruments with different 
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seniorities, such as senior and subordinated CDS. 11 Furthermore, the extracted risk- 
neutral recoveries are more in line with the observed final auction results. Generally, 
further instruments, e.g., loans or the recently more popular contingent convertibles 
could be used in a similar way. 
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Upside and Downside Risk Exposures 
of Currency Carry Trades via Tail 
Dependence 


Matthew Ames, Gareth W. Peters, Guillaume Bagnarosa 
and Ioannis Kosmidis 


Abstract Currency carry trade is the investment strategy that involves selling low 
interest rate currencies in order to purchase higher interest rate currencies, thus 
profiting from the interest rate differentials. This is a well known financial puzzle 
to explain, since assuming foreign exchange risk is uninhibited and the markets 
have rational risk-neutral investors, then one would not expect profits from such 
strategies. That is, according to uncovered interest rate parity (UIP), changes in 
the related exchange rates should offset the potential to profit from such interest 
rate differentials. However, it has been shown empirically, that investors can earn 
profits on average by borrowing in a country with a lower interest rate, exchanging 
for foreign currency, and investing in a foreign country with a higher interest rate, 
whilst allowing for any losses from exchanging back to their domestic currency at 
maturity. 

This paper explores the financial risk that trading strategies seeking to exploit 
a violation of the UIP condition are exposed to with respect to multivariate tail 
dependence present in both the funding and investment currency baskets. It will 
outline in what contexts these portfolio risk exposures will benefit accumulated 
portfolio returns and under what conditions such tail exposures will reduce portfolio 
returns. 
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Keywords Currency carry trade • Multivariate tail dependence • Forward premium 
puzzle • Mixture models • Generalised archimedean copula 


1 Currency Carry Trade and Uncovered Interest Rate Parity 


One of the most robust puzzles in finance still to be satisfactorily explained is the 
uncovered interest rate parity puzzle and the associated excess average returns of 
currency carry trade strategies. Such trading strategies are popular approaches which 
involve constructing portfolios by selling low interest rate currencies in order to buy 
higher interest rate currencies, thus profiting from the interest rate differentials. The 
presence of such profit opportunities, pointed out by [2, 10, 15] and more recently 
by [5-7, 20, 21, 23], violates the fundamental relationship of uncovered interest rate 
parity (UIP). The UIP refers to the parity condition in which exposure to foreign 
exchange risk, with unanticipated changes in exchange rates, is uninhibited and 
therefore if one assumes rational risk-neutral investors, then changes in the exchange 
rates should offset the potential to profit from the interest rate differentials between 
high interest rate (investment) currencies and low interest rate (funding) currencies. 
We can more formally write this relation by assuming that the forward price, Fj , is 
a martingale under the risk-neutral probability Q ([24]): 


e q 


s v 
T 



F ‘ _ e (r,-r*)(T-t) 

St 


( 1 ) 


The UIP Eq. (1) thus states that under the risk-neutral probability, the expected vari- 
ation of the exchange rate S t should equal the differential between the interest rate 
of the two associated countries, denoted by, respectively, r t and r*. The currency 
carry trade strategy investigated in this paper aims at exploiting violations of the UIP 
relation by investing a certain amount in a basket of high interest rate currencies (the 
long basket), while funding it through a basket of low interest rate currencies (the 
short basket). When the UIP holds, then given foreign exchange market equilibrium, 
no profit should arise on average from this strategy, however, such opportunities are 
routinely observed and exploited by large volume trading strategies. 

In this paper, we build on the existing literature by studying a stochastic feature 
of the joint tail behaviours of the currencies within each of the long and the short 
baskets, which form the carry trade. We aim to explore to what extent one can attribute 
the excess average returns with regard to compensation for exposure to tail risk, for 
example either dramatic depreciations in the value of the high interest rate currencies 
or dramatic appreciations in the value of the low interest rate currencies in times of 
high market volatility. 

We postulate that such analyses should also benefit from consideration not only 
of the marginal behaviours of the processes under study, in this case the exchange 
rates of currencies in a portfolio, but also a rigorous analysis of the joint dependence 
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features of such relationships. We investigate such joint relationships in light of the 
UIP condition. To achieve this, we study the probability of joint extreme movements 
in the funding and investment currency baskets and interpret these extremal tail proba- 
bilities as relative risk exposures of adverse and beneficial joint currency movements, 
which would affect the portfolio returns. This allows us to obtain a relative contribu- 
tion to the exposure of the portfolio profit decomposed in terms of the downside and 
upside risks that are contributed from such tail dependence features in each currency 
basket. We argue that the analysis of the carry trade is better informed by jointly 
modelling the multivariate behaviour of the marginal processes of currency baskets 
accounting for potential multivariate extremes, whilst still incorporating heavy tailed 
relationships studied in marginal processes. 

We fit mixture copula models to vectors of daily exchange rate log returns between 
1989 and 2014 for both the investment and funding currency baskets making up the 
carry trade portfolio. The method and the dataset considered for the construction 
of the respective funding and investing currencies baskets are thoroughly described 
in [1]. The currency compositions of the funding and investment baskets are vary- 
ing daily over time as a function of the interest rate differential processes for each 
currency relative to the USD. 

Our analysis concludes that the appealing high return profile of a carry portfolio 
is not only compensating the tail thickness of each individual component probability 
distribution but also the fact that extreme returns tend to occur simultaneously and 
lead to a portfolio particularly sensitive to the risk of what is known as drawdown. 
Furthermore, we also demonstrate that high interest rate currency baskets and low 
interest rate currency baskets can display periods during which the tail dependence 
gets inverted, demonstrating when periods of construction of the aforementioned 
carry positions are being undertaken by investors. 


2 Interpreting Tail Dependence as Financial Risk Exposure 
in Carry Trade Portfolios 

In order to fully understand the tail risks of joint exchange rate movements present 
when one invests in a carry trade strategy, we can look at both the downside extremal 
tail exposure and the upside extremal tail exposure within the funding and investment 
baskets that comprise the carry portfolio. The downside tail exposure can be seen 
as the crash risk of the basket, i.e. the risk that one will suffer large joint losses 
from each of the currencies in the basket. These losses would be the result of joint 
appreciations of the currencies that one is short in the low interest rate basket and/or 
joint depreciations of the currencies that one is long in the high interest rate basket. 

Definition 1 ( Downside Tail Risk Exposure in Carry Trade Portfolios ) Consider the 
investment currency (long) basket with n -exchange rates relative to base currency, on 
day t, with currency log returns (X ^ , xf ^ , . . . , x\ d ^). Then, the downside tail expo- 
sure risk for the carry trade will be defined as the conditional probability of adverse 


166 


M. Ames et al. 


currency movements in the long basket, corresponding to its upper tail dependence 
(a loss for a long position results from a forward exchange rate increase), given by, 

(«) := Pr ( X P > FF 1 («) |X, (1) > Ff : 1 («), . . . , X, (i_1) > f\-\ («), X ; (i+1) > («), .... x\ d) > FJ l («)) 

( 2 ) 


for a currency of interest i e {1, 2, . . . , d] where F t is the marginal distribution for 
the asset i. Conversely, the downside tail exposure for the funding (short) basket 
with d currencies will be defined as the conditional probability of adverse currency 
movement in the short basket (a loss for a short position results from a forward 
exchange rate decrease), given by 


:= Pr (x^ < F~\u) |X ( (1) 




< Ff 1 , Xf 0 < F7_\ («), xf +l) < f\“\ («), .... X\ d) < Fj l («)) . 

(3) 


In general, then a basket’s upside or downside risk exposure would be quantified by 
the probability of a loss (or gain) arising from an appreciation or depreciation jointly 
of magnitude u and the dollar cost associated to a given loss/gain of this magnitude. 
The standard approach in economics would be to associate say a linear cost function 
in u to such a probability of loss to get say the downside risk exposure in dollars 
according to £)■ ( u ) = C(FF l (u)) x ( u ), which will be a function of the level u. 
As becomes independent of the marginals, i.e. as u — > 0 or u — > 1, C % also 
becomes independent of the marginals. 

Conversely, we will also define the upside tail exposure that will contribute to 
profitable returns in the carry trade strategy when extreme movements that are in 
favour of the carry position held. These would correspond to precisely the prob- 
abilities discussed above applied in the opposite direction. That is the upside risk 
exposure in the funding (short) basket is given by Eq. (2) and the upside risk exposure 
in the investment (long) basket is given by Eq. (3). That is the upside tail exposure of 
the carry trade strategy is defined to be the risk that one will earn large joint profits 
from each of the currencies in the basket. These profits would be the result of joint 
depreciations of the currencies that one is short in the low interest rate basket and/or 
joint appreciations of the currencies that one is long in the high interest rate basket. 

Remark 7 In a basket with d currencies, d > 2, if one considers capturing the 
upside and downside financial risk exposures from a model-based calculation of 
these extreme probabilities, then if the parametric model is exchangeable, such as 
an Archimedean copula, then swapping currency i in Eqs. (2) and (3) with another 
currency from the basket, say j will not alter the downside or upside risk exposures. 
If they are not exchangeable, then one can consider upside and downside risks for 
each individual currency in the carry trade portfolio. 

We thus consider these tail upside and downside exposures of the carry trade 
strategy as features that can show that even though average profits may be made 
from the violation of UIP, it comes at significant tail exposure. 
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We can formalise the notion of the dependence behaviour in the extremes of the 
multivariate distribution through the concept of tail dependence, limiting behaviour 
of Eqs. (2) and (3), as u \ 1 and u | 0 asymptotically. The interpretation of such 
quantities is then directly relevant to assessing the chance of large adverse move- 
ments in multiple currencies which could potentially increase the risk associated 
with currency carry trade strategies significantly, compared to risk measures which 
only consider the marginal behaviour in each individual currency. Under certain sta- 
tistical dependence models, these extreme upside and downside tail exposures can 
be obtained analytically. We develop a flexible copula mixture example that has such 
properties below. 


3 Generalised Archimedean Copula Models for Currency 
Exchange Rate Baskets 

In order to study the joint tail dependence in the investment or funding basket, 
we consider an overall tail dependence analysis which is parametric model based, 
obtained by using flexible mixtures of Archimedean copula components. Such a 
model approach is reasonable since typically the number of currencies in each of 
the long basket (investment currencies) and the short basket (funding currencies) 
is 4 or 5. 

In addition, these models have the advantage that they produce asymmetric depen- 
dence relationships in the upper tails and the lower tails in the multivariate model. 
We consider three models; two Archimedean mixture models and one outer power 
transformed Clayton copula. The mixture models considered are the Clayton-Gumbel 
mixture and the Clayton-Frank-Gumbel mixture, where the Frank component allows 
for periods of no tail dependence within the basket as well as negative dependence. 
We fit these copula models to each of the long and short baskets separately. 

Definition 2 ( Mixture Copula ) A mixture copula is a linear weighted combination 
of copulae of the form: 

N 

C M (u;0) = Y j KCi(u-,0 i ), (4) 

i = 1 

where 0 < A.,- < 1 Vi e {1, ... , N } and £/=i A,- = 1. 

Definition 3 ( Archimedean Copula) A d-dimensional copula C is called Archime- 
dean if it can be represented by the form: 

C(u) = is{ir- l (ui) + --- + is~ l (u d )} Vu € [0, l] d , (5) 

where xj/ is an Archimedean generator satisfying the conditions given in [22]. 
\fr~ l : [0, 1] -> [0, oo) is the inverse generator with \[/~ l (0) = inf{£ : = 0}. 
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In the following section, we consider two stages to estimate the multivariate basket 
returns, first the estimation of suitable heavy tailed marginal models for the currency 
exchange rates (relative to USD), followed by the estimation of the dependence 
structure of the multivariate model composed of multiple exchange rates in currency 
baskets for long and short positions. 

Once the parametric Archimedean mixture copula model has been fitted to a basket 
of currencies, it is possible to obtain the upper and lower tail dependence coefficients, 
via closed form expressions for the class of mixture copula models and outer power 
transform models we consider. The tail dependence expressions for many common 
bivariate copulae can be found in [25]. This concept was recently extended to the 
multivariate setting by [9] . 

Definition 4 ( Generalised Archimedean Tail Dependence Coefficient) Let X = 
(Ai, . . . , Xd) T be an d-dimensional random vector with distribution C(F\(X\), 

. . . , Fd(Xd)), where C is an Archimedean copula and F\, . . . , Fd are the marginal 
distributions. The coefficients of upper and lower tail dependence are defined respec- 
tively as: 

x)$' h]h+l d = hm_ P (xi > Ff '(«), ...,X h > FC(u)\X h+1 > F^(u) 

= lim 7 r- , 

'^ 0+ z£? ((/AVc-iy [fan]) 

a Hm + p (xi < F - >(«), . . . , X h < F-\u)\X M < F~^(u), ...,X d < F~\u)) 

lim — , f{d,) (7) 

t-*oo d — h \[f {{d — h)t ) 

for the model dependence function ‘generator’ i/r(-) and its inverse function. 

In [9], the analogous form of the generalised multivariate upper and lower tail 
dependence coefficients for outer power transformed Clayton copula models is pro- 
vided. The derivation of Eqs. (6) and (7) for the outer power case follows from [12], 
i.e. the composition of a completely monotone function with a non-negative func- 
tion that has a completely monotone derivative is again completely monotone. The 
densities for the outer power Clayton copula can be found in [1]. 

In the above definitions of model-based parametric upper and lower tail depen- 
dence, one gets the estimates of joint extreme deviations in the whole currency basket. 
It will often be useful in practice to understand which pairs of currencies within a 
given currency basket contribute significantly to the downside or upside risks of the 
overall currency basket. In the class of Archimedean-based mixtures we consider, 
the feature of exchangeability precludes decompositions of the total basket down- 
side and upside risks into individual currency specific components. To be precise, 
we aim to perform a decomposition of say the downside risk of the funding basket 
into contributions from each pair of currencies in the basket, we will do this via a 
simple linear projection onto particular subsets of currencies in the portfolio that are 


k ££ 


,...,x d > Fj\u )) 

( 6 ) 
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of interest, which leads, for example to the following expression: 


E 


t 


uib 2, 
k w 


,,i — 1 , i T 1 , • • • , d 


J2|l C3|l C 3|2 ?d\d-l 

k w ’ k q / , k q / » • • • , k q/ 


d 

“o + 


( 8 ) 


where is a random variable since it is based on parameters of 

the mixture copula model which are themselves functions of the data and therefore 
random variables. Such a simple linear projection will then allow one to interpret 
directly the marginal linear contributions to the upside or downside risk exposure 
of the basket obtained from the model, according to particular pairs of currencies in 
the basket by considering the coefficients oil ] , i.e. the projection weights. To perform 
this analysis, we need estimates of the pairwise tail dependence in the upside and 
downside risk exposures A^' and A^j for each pair of currencies i, j e {1, 2, . . . , d). 
We obtain this through non-parametric (model-free) estimators, see [8]. 

Definition 5 Non-Parametric Pairwise Estimator of Upper Tail Dependence 
(Extreme Exposure) 


A^ — 2 — min 


2 log 

log(^) 


k= 1, 2, . . . , n — 1, 


(9) 


~ 11 / J? \ 

where C n (mi, ui) = ^ - mi, -^ < U 2 j and Rji is the rank of the variable 

i = 1 V ' 

in its marginal dimension that makes up the pseudo data. 

In order to form a robust estimator of the upper tail dependence, a median of 
the estimates obtained from setting k as the 1st, 2nd, . . . , 20th percentile values was 
used. Similarly, k was set to the 80th, 81st, ... , 99th percentiles for the lower tail 
dependence. 


4 Currency Basket Model Estimations via Inference 
Function for the Margins 


The inference function for margins (IFM) technique introduced in [17] provides a 
computationally faster method for estimating parameters than Full Maximum Like- 
lihood, i.e. simultaneously maximising all model parameters and produces in many 
cases a more stable likelihood estimation procedure. This two-stage estimation pro- 
cedure was studied with regard to the asymptotic relative efficiency compared with 
maximum likelihood estimation in [16] and in [14]. It can be shown that the IFM 
estimator is consistent under weak regularity conditions. 

In modelling parametrically the marginal features of the log return forward 
exchange rates, we wanted flexibility to capture a broad range of skew-kurtosis rela- 
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tionships as well as potential for sub-exponential heavy tailed features. In addition, 
we wished to keep the models to a selection which is efficient to perform inference 
and easily interpretable. We consider a flexible three parameter model for the mar- 
ginal distributions given by the Log-Generalised Gamma distribution (l.g.g.d.), see 
details in [19], where Y has a l.g.g.d. if Y = log(X) such that X has a g.g.d. The 
density of Y is given by 


Mr - k - ■ b) = wk ■ “ p K : V) - 1 “ p (/ V 1 )] • <10) 

with u = log (o'), b = /3 — 1 and the support of the l.g.g.d. distribution is y e M. 

This flexible three-parameter model admits the LogNormal model as a limiting 
case (as k — > oo). In addition, the g.g.d. also includes the exponential model (ft = 
k = 1), the Weibull distribution (k = 1) and the Gamma distribution (ft = 1). 

As an alternative to the l.g.g.d. model, we also consider a time series approach to 
modelling the marginals, given by the GARCH(p,g) model, as described in [3, 4], 
and characterised by the error variance: 

q p 

a 2 = a 0 + X a ‘ e ki + (11) 

i = 1 i — l 


4.1 Stage 1: Fitting the Marginal Distributions via MLE 

The estimation for the three model parameters in the l.g.g.d. can be challenging due to 
the fact that a wide range of model parameters, especially for k , can produce similar 
resulting density shapes (see discussions in [19]). To overcome this complication 
and to make the estimation efficient, it is proposed to utilise a combination of profile 
likelihood methods over a grid of values for k and perform profile likelihood based 
MLE estimation for each value of k , over the other two parameters b and u. The 
differentiation of the profile likelihood for a given value of k produces the system of 
two equations: 


^'1 ' s;'.i°p(*) %/I 

( 12 ) 

where n is the number of observations, yt = log X ( , a = b/y/k and jl = u + b log k. 
The second equation is solved directly via a simple root search to give an estimation 
for a and then substitution into the first equation results in an estimate for /x. Note, 
for each value of k we select in the grid, we get the pair of parameter estimates jl 


exp (/x) = 


X ex p( 


Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 


171 


and <7, which can then be plugged back into the profile likelihood to make it purely 
a function of k , with the estimator for k then selected as the one with the maximum 
likelihood score. As a comparison, we also fit the GARCH(1,1) model using the 
MATLAB MFEtoolbox using the default settings. 


4.2 Stage 2: Fitting the Mixture Copula via MLE 


In order to fit the copula model, the parameters are estimated using maximum like- 
lihood on the data after conditioning on the selected marginal distribution models 
and their corresponding estimated parameters obtained in Stage 1 . These models are 
utilised to transform the data using the CDF function with the l.g.g.d. MLE parame- 
ters ( k , u and b) or using the conditional variances to obtain standardised residuals 
for the GARCH model. Therefore, in this second stage of MLE estimation, we aim 
to estimate either the one parameter mixture of CFG components with parameters 
0_ = (Pclayton? Pfrank? Pgumbel? ^clayton? ^-frank? ^gumbel)? the one parameter mixture of 
CG components with parameters 0 = (p c i ay ton, Pgumbel, Clayton, ^ gumb ei) or the two 
parameter outer power transformed Clayton with parameters 0 = (pdayton , ^clayton)- 
The log likelihood expression for the mixture copula models, is given generically 
by: 


n n d 

1(0) = ^l°g c(Fi(Zfi; £i, <7 1), . . . , Fd(X i( i ; a^)) + zz log fjiXij-.frj.Oj). 

i = 1 i = 1 j=l 

(13) 

This optimization is achieved via a gradient descent iterative algorithm which was 
found to be quite robust given the likelihood surfaces considered in these models with 
the real data. Alternative estimation procedures such as expectation-maximisation 
were not found to be required. 


5 Exchange Rate Multivariate Data Description and 
Currency Portfolio Construction 


In our study, we fit copula models to the high interest rate basket and the low interest 
rate basket updated for each day in the period 02/01/1989 to 29/01/2014 using log 
return forward exchange rates at one month maturities for data covering both the 
previous 6 months and previous year as a sliding window analysis on each trading 
day in this period. 

Our empirical analysis consists of daily exchange rate data for a set of 34 currency 
exchange rates relative to the USD, as in [23]. The currencies analysed included: 
Australia (AUD), Brazil (BRL), Canada (CAD), Croatia (HRK), Cyprus (CYP), 
Czech Republic (CZK), Egypt (EGP), Euro area (EUR), Greece (GRD), Hungary 
(HUF), Iceland (ISK), India (INR), Indonesia (IDR), Israel (ILS), Japan (JPY), 
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Malaysia (MYR), Mexico (MXN), New Zealand (NZD), Norway (NOK), Philippines 
(PHP), Poland (PLN), Russia (RUB), Singapore (SGD), Slovakia (SKK), Slove- 
nia (SIT), South Africa (ZAR), South Korea (KRW), Sweden (SEK), Switzerland 
(CHF), Taiwan (TWD), Thailand (THB), Turkey (TRY), Ukraine (UAH) and the 
United Kingdom (GBP). 

We have considered daily settlement prices for each currency exchange rate as 
well as the daily settlement price for the associated 1 month forward contract. We 
utilise the same dataset (albeit starting in 1989 rather than 1983 and running up 
until January 2014) as studied in [20, 23] in order to replicate their portfolio returns 
without tail dependence risk adjustments. Due to differing market closing days, e.g. 
national holidays, there was missing data for a couple of currencies and for a small 
number of days. For missing prices, the previous day’s closing prices were retained. 

As was demonstrated in Eq. (1), the differential of interest rates between two 
countries can be estimated through the ratio of the forward contract price and the 
spot price, see [18] who show this holds empirically on a daily basis. Accordingly, 
instead of considering the differential of risk-free rates between the reference and 
the foreign countries, we build our respective baskets of currencies with respect to 
the ratio of the forward and the spot prices for each currency. On a daily basis, 
we compute this ratio for each of the d currencies (available in the dataset on that 
day) and then build five baskets. The first basket gathers the d/5 currencies with the 
highest positive differential of interest rate with the US dollar. These currencies are 
thus representing the ‘investment’ currencies, through which we invest the money to 
benefit from the currency carry trade. The last basket will gather the d/5 currencies 
with the highest negative differential (or at least the lowest differential) of interest 
rate. These currencies are thus representing the ‘financing’ currencies, through which 
we borrow the money to build the currency carry trade. 

Given this classification, we investigate then the joint distribution of each group 
of currencies to understand the impact of the currency carry trade, embodied by the 
differential of interest rates, on currencies returns. In our analysis, we concentrate on 
the high interest rate basket (investment currencies) and the low interest rate basket 
(funding currencies), since typically when implementing a carry trade strategy one 
would go short the low interest rate basket and go long the high interest rate basket. 


6 Results and Discussion 

In order to model the marginal exchange rate log returns, we considered two 
approaches. First, we fit Fog Generalised Gamma models to each of the 34 cur- 
rencies considered in the analysis, updating the fits for every trading day based on 
a 6 month sliding window. A time series approach was also considered to fit the 
marginals, as is popular in much of the recent copula literature, see for example [4], 
using GARCH(1,1) models for the 6-month sliding data windows. In each case we 
are assuming approximate local stationarity over these short 6 month time frames. 
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Table 1 Average AIC for the Generalised Gamma (GG) and the GARCH(1,1) for the four most 
frequent currencies in the high interest rate and the low interest rate baskets over the 2001-2014 
data period split into two chunks, i.e. 6 years 




01-07 

07-14 

Investment 

Currency 

GG 

GARCH 

GG 

GARCH 

TRY 

356.9 (3.5) 

341.1 (21.7) 

358.7 (3.0) 

349.1 (16.8) 

MXN 

360.0(1.2) 

357.04 (3.8) 

358.6 (4.0) 

344.5 (28.1) 

ZAR 

358.7 (3.0) 

353.5 (11.4) 

358.0 (6.1) 

352.8 (12.2) 

BRL 

359.0 (2.8) 

341.6(19.4) 

360.0 (2.1) 

341.6 (23.2) 

Funding 

JPY 

361.2 (0.9) 

356.5 (7.2) 

356.9 (6.8) 

355.0 (7.0) 

CHF 

360.8 (1.4) 

359.1 (2.9) 

358.6 (7.4) 

355.4 (8.8) 

SGD 

360.0 (2.7) 

356.8 (5.7) 

360.0 (2.6) 

353.7 (7.5) 

TWD 

358.7 (6.2) 

347.0 (16.4) 

359.1 (5.8) 

348.5 (13.2) 


Standard deviations are shown in parentheses. Similar performance was seen between 1989 and 
2001 


A summary of the marginal model selection can be seen in Table 1 , which shows 
the average AIC scores for the four most frequent currencies in the high interest 
rate and the low interest rate baskets over the data period. Whilst the AIC for the 
GARCH( 1,1) model is consistently lower than the respective AIC for the Generalised 
Gamma, the standard errors are sufficiently large for there to be no clear favourite 
between the two models. 

However, when we consider the model selection of the copula in combination 
with the marginal model, we observe lower AIC scores for copula models fitted 
on the pseudo-data resulting from using Generalised Gamma margins than using 
GARCH(1,1) margins. This is the case for all three copula models under consid- 
eration in the paper. Figure 1 shows the AIC differences when using the Clayton- 
Frank-Gumbel copula in combination with the two choices of marginal for the high 
interest rate and the low interest rate basket, respectively. Over the entire data period, 
the mean difference between the AIC scores for the CFG model with Generalised 
Gamma versus GARCH(1,1) marginals for the high interest rate basket is 12.3 and 
for the low interest rate basket is 3.6 in favour of the Generalised Gamma. 

Thus, it is clear that the Generalised Gamma model is the best model in our copula 
modelling context and so is used in the remainder of the analysis. We now consider 
the goodness-of-fit of the three copula models applied to the high interest rate basket 
and low interest rate basket pseudo data. We used a scoring via the AIC between the 
three component mixture CFG model versus the two component mixture CG model 
versus the two parameter OpC model. One could also use the Copula-Information- 
Criterion (CIC), see [13] for details. 

The results are presented for this comparison in Fig. 2, which shows the dif- 
ferentials between AIC for CFG versus CG and CFG versus OpC for each of the 
high interest rate and the low interest rate currency baskets. We can see it is not 
unreasonable to consider the CFG model for this analysis, since over the entire data 
period, the mean difference between the AIC scores for the CFG and the CG models 


174 


M. Ames et al. 


AIC of CFG Model with GenGamma Margins minus AIC of CFG with GARCH Margins on High Basket. 
(Negative means CFG GenGamma is a better fit) 



AIC of CFG Model with GenGamma Margins minus AIC of CFG with GARCH Margins on Low Basket. 
(Negative means CFG GenGamma is a better fit) 



Fig. 1 Comparison of AIC for Clayton-Frank-Gumbel model fit on the pseudo-data resulting from 
generalised gamma versus GARCH(1,1) margins. The high interest rate basket is shown in the 
upper panel and the low interest rate basket is shown in the lower panel 


for the high interest rate basket is 1.33 and for the low interest rate basket is 1.62 in 
favour of the CFG. 

However, from Fig. 2, we can see that during the 2008 credit crisis period, the CFG 
model is performing much better. The CFG copula model provides a much better 
fit when compared to the OpC model, as shown by the mean difference between 
the AIC scores of 9.58 for the high interest rate basket and 9.53 for the low interest 
rate basket. Similarly, the CFG model performs markedly better than the OpC model 
during the 2008 credit crisis period. 


6.1 Tail Dependence Results 

Below, we will examine the time- varying parameters of the maximum likelihood fits 
of this mixture CFG copula model. Here, we shall focus on the strength of dependence 
present in the currency baskets, given the particular copula structures in the mixture, 
which is considered as tail upside/downside exposure of a carry trade over time. 
Figure 3 shows the time-varying upper and lower tail dependence, i.e. the extreme 
upside and downside risk exposures for the carry trade basket, present in the high 
interest rate basket under the CFG copula fit and the OpC copula fit. Similarly, Fig. 4 
shows this for the low interest rate basket. 

Remark 2 (Model Risk and its Influence on Upside and Downside Risk Exposure) In 
fitting the OpC model, we note that independent of the strength of true tail dependence 
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Fig. 2 Comparison of AIC for Clayton-Frank-Gumbel model with Clayton-Gumbel and outer 
power clayton models on high and low interest rate baskets with generalised gamma margins. The 
high interest rate basket is shown in the upper panel and the low interest rate basket is shown in the 
lower panel 


in the multivariate distribution, the upper tail dependence coefficient for this 
model strictly increases with dimension very rapidly. Therefore, when fitting the OpC 
model, if the basket size becomes greater than bivariate, i.e. from 1999 onwards, the 
upper tail dependence estimates become very large (even for outer power parameter 
values very close to = 1). This lack of flexibility in the OpC model only becomes 


VIX vs Tail Dependence Present in CFG Copula and OpC Copula in High IR Basket 

Upper Tail Dependence 
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Fig. 3 Comparison of Volatility Index (VIX) with upper and lower tail dependence of the high 
interest rate basket in the CFG copula and OpC copula. US NBER recession periods are represented 
by the shaded grey zones. Some key crisis dates across the time period are labelled 
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VIX vs Tail Dependence Present in CFG Copula and OpC Copula in Low IR Basket 
Upper Tail Dependence 



Date 


Fig. 4 Comparison of Volatility Index (VIX) with upper and lower tail dependence of the low 
interest rate basket in the CFG copula and OpC copula. US NBER recession periods are represented 
by the shaded grey zones. Some key crisis dates across the time period are labelled 


apparent in baskets of dimension greater than 2, but is also evident in the AIC scores 
in Fig. 2. Here, we see an interesting interplay between the model risk associated to 
the dependence structure being fit and the resulting interpreted upside or downside 
financial risk exposures for the currency baskets. 

Focusing on the tail dependence estimate produced from the CFG copula fits, we 
can see that there are indeed periods of heightened upper and lower tail dependence in 
the high interest rate and the low interest rate baskets. There is a noticeable increase 
in upper tail dependence in the high interest rate basket at times of global market 
volatility. Specifically, during late 2007, i.e. the global financial crisis, there is a 
sharp peak in upper tail dependence. Preceding this, there is an extended period of 
heightened lower tail dependence from 2004 to 2007, which could tie in with the 
building of the leveraged carry trade portfolio positions. This period of carry trade 
construction is also very noticeable in the low interest rate basket through the very 
high levels of upper tail dependence. 

We compare in Figs. 3 and 4 the tail dependence plotted against the VIX volatility 
index for the high interest rate basket and the low interest rate basket, respectively, 
for the period under investigation. The VIX is a popular measure of the implied 
volatility of S&P 500 index options — often referred to as the fear index. As such, 
it is one measure of the market’s expectations of stock market volatility over the 
next 30 days. We can clearly see here that in the high interest rate basket, there 
are upper tail dependence peaks at times when there is an elevated VIX index, 
particularly post-crisis. However, we would not expect the two to match exactly 
since the VIX is not a direct measure of global FX volatility. We can thus conclude 
that investors’ risk aversion clearly plays an important role in the tail behaviour. This 
conclusion corroborates recent literature regarding the skewness and the kurtosis 
features characterising the currency carry trade portfolios [5, 11, 23]. 
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6.2 Pairwise Decomposition of Basket Tail Dependence 


In order to examine the contribution of each pair of currencies to the overall n- 
dimensional basket tail dependence, we calculated the corresponding non-parametric 
pairwise tail dependencies for each pair of currencies. In Fig. 5, we can see the average 
upper and lower non-parametric tail dependence for each pair of currencies during 
the 2008 credit crisis, with the 3 currencies most frequently in the high interest rate 
and the low interest rate baskets labelled accordingly. The lower triangle represents 
the non-parametric pairwise lower tail dependence and the upper triangle represents 
the non-parametric pairwise upper tail dependence. 

If one was trying to optimise their currency portfolio with respect to the tail risk 
exposures, i.e. to minimise negative tail risk exposure and maximise positive tail risk 
exposure, then one would sell short currencies with high upper tail dependence and 
low lower tail dependence, whilst buying currencies with low upper tail dependence 
and high lower tail dependence. 

Similarly, in Fig. 6 we see the pairwise non-parametric tail dependencies averaged 
over the last 12 months (01/02/2013 to 29/01/2014). Comparing this heat map to the 
heat map during the 2008 credit crisis (Fig. 5), we notice that in general there are 
lower values of tail dependence amongst the currency pairs. 

We performed linear regression of the pairwise non-parametric tail dependence 
on the respective basket tail dependence for the days, during the period (01/02/2013 
to 29/01/2014), on which the 3 currencies all appeared in the basket (224 out of 
250 for the lower interest rate basket and 223 out of 250 for the high interest rate 
basket). The regression coefficients and R 2 values can be seen in Table 2. We can 



Period = 26-May-2008 - 23-Nov-2009 


Fig. 5 Heat map showing the strength of non-parametric tail dependence between each pair of 
currencies averaged over the 2008 credit crisis period. Lower tail dependence is shown in the lower 
triangle and upper tail dependence is shown in the upper triangle. The 3 currencies most frequently 
in the high interest rate and the low interest rate baskets are labelled 
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Period = 01-Feb-2013 - 29-Jan-2014 



EUR TRY JPY GBP AUD CAD NOK CHF SEK MXN PLN MYR SGD INR ZAR NZD THB KRW TWD BRL HRK CZK EGP HUF ISK IDR ILS PHP RUB UAH 


Fig. 6 Heat map showing the strength of non-parametric tail dependence between each pair of 
currencies averaged over the last 12 months (01/02/2013-29/01/2014). Lower tail dependence is 
shown in the lower triangle and upper tail dependence is shown in the upper triangle. The 3 
currencies most frequently in the high interest rate and the low interest rate baskets are labelled 


interpret this as the relative contribution of each of the 3 currency pairs to the overall 
basket tail dependence. We note that for the low interest rate lower tail dependence 
and for the high interest rate upper tail dependence, there is a significant degree of 
cointegration between the currency pair covariates and hence we might be able to 
use a single covariate due to the presence of a common stochastic trend. 


Table 2 Pairwise non-parametric tail dependence, during the period 01/02/2013 to 29/01/2014, 
regressed on respective basket tail dependence (standard errors are shown in parentheses) 


Low IR Basket 

Constant 

CHF JPY 

CZK CHF 

CZK JPY 

R 2 

Upper TD 

0.22 (0.01) 

0.02 (0.03) 

0.18 (0.02) 

0.38 (0.05) 

0.57 

Lower TD 

0.71 (0.17) 

-0.62 (0.25) 

-0.38 (0.26) 

0.23 (0.32) 

0.28 

High IR Basket 

Constant 

EGP INR 

UAH EGP 

UAH INR 

R 2 

Upper TD 

0.07 (0.01) 

-0.06 (0.33) 

0.59 (0.08) 

2.37 (0.42) 

0.4 

Lower TD 

0.1 (0.02) 

0.56 (0.05) 

0.44 (0.08) 

-0.4 (0.07) 

0.44 


The 3 currencies most frequently in the respective baskets are used as independent variables 
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6.3 Understanding the Tail Exposure Associated with the 
Carry Trade and Its Role in the UIP Puzzle 

As was discussed in Sect. 2, the tail exposures associated with a currency carry trade 
strategy can be broken down into the upside and downside tail exposures within each 
of the long and short carry trade baskets. The downside relative exposure adjusted 
returns are obtained by multiplying the monthly portfolio returns by one minus the 
upper and the lower tail dependence present, respectively, in the high interest rate 
basket and the low interest rate basket at the corresponding dates. The upside relative 
exposure adjusted returns are obtained by multiplying the monthly portfolio returns 
by one plus the lower and upper tail dependence present, respectively, in the high 
interest rate basket and the low interest rate basket at the corresponding dates. Note 
that we refer to these as relative exposure adjustments only for the tail exposures 
since we do not quantify a market price per unit of tail risk. However, this is still 
informative as it shows a decomposition of the relative exposures from the long and 
short baskets with regard to extreme events. 


Downside Risk Adjusted Returns for HML basket (penalising tail dependence) 



Date 


Fig. 7 Cumulative log returns of the carry trade portfolio (HML = High interest rate basket minus 
low interest rate basket). Downside exposure adjusted cumulative log returns using upper/lower 
tail dependence in the high/low interest rate basket for the CFG copula and the OpC copula are 
shown for comparison 


As can be seen in Fig. 7, the relative adjustment to the absolute cumulative returns 
for each type of downside exposure is greatest for the low interest rate basket, except 
under the OpC model, but this is due to the very poor fit of this model to baskets 
containing more than 2 currencies which we see transfers to financial risk exposures. 
This is interesting because intuitively one would expect the high interest rate basket 
to be the largest source of tail exposure. However, one should be careful when 
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Upside Risk Adjusted Returns for HML basket (rewarding tail dependence) 



Date 


Fig. 8 Cumulative log returns of the carry trade portfolio (HML = High interest rate basket minus 
low interest rate basket). Upside exposure adjusted cumulative log returns using lower/upper tail 
dependence in the high/low interest rate basket for the CFG copula and the OpC copula are shown 
for comparison 


interpreting this plot, since we are looking at the extremal tail exposure. The analysis 
may change if one considered the intermediate tail risk exposure, where the marginal 
effects become significant. Similarly, Fig. 8 shows the relative adjustment to the 
absolute cumulative returns for each type of upside exposure is greatest for the low 
interest rate basket. The same interpretation as for the downside relative exposure 
adjustments can be made here for upside relative exposure adjustments. 


7 Conclusion 

In this paper, we have shown that the positive and negative multivariate tail risk 
exposures present in currency carry trade baskets are additional factors needing 
careful consideration when one constructs a carry portfolio. Ignoring these exposures 
leads to a perceived risk return profile that is not reflective of the true nature of such 
a strategy. In terms of marginal model selection, it was shown that one is indifferent 
between the log Generalised Gamma model and the frequently used GARCH(1,1) 
model. However, in combination with the three different Archimedean copula models 
considered in this paper, the log Generalised Gamma marginals provided a better 
overall model fit. 

Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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Participating Life Insurance Contracts under 
Risk Based Solvency Frameworks: How 
to Increase Capital Efficiency by Product Design 


Andreas ReuB, Jochen RuB and Jochen Wieland 


Abstract Traditional participating life insurance contracts with year-to-year 
(cliquet-style) guarantees have come under pressure in the current situation of low 
interest rates and volatile capital markets, in particular when priced in a market con- 
sistent valuation framework. In addition, such guarantees lead to rather high capital 
requirements under risk-based solvency frameworks such as Solvency II or the Swiss 
Solvency Test (SST). We introduce several alternative product designs and analyze 
their impact on the insurer’s financial situation. We also introduce a measure for 
Capital Efficiency that considers both, profits and capital requirements, and compare 
the results of the innovative products to the traditional product design with respect 
to Capital Efficiency in a market consistent valuation model. 

Keywords Capital efficiency • Participating life insurance • Embedded options • 
Interest rate guarantees • Market consistent valuation • Risk based capital require- 
ments • Solvency II • SST 


1 Introduction 

Traditional participating life insurance products play a major role in old-age provision 
in Continental Europe and in many other countries. These products typically come 
with a guaranteed benefit at maturity, which is calculated using some guaranteed 
minimum interest rate. Furthermore, the policyholders receive an annual surplus 
participation that depends on the performance of the insurer’s assets. With the so- 
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called cliquet-style guarantees, once such surplus has been assigned to the policy at 
the end of the year, it increases the guaranteed benefit based on the same guaranteed 
minimum interest rate. This product design can create significant financial risk. 

Briys and de Varenne [8] were among the first to analyze the impact of interest rate 
guarantees on the insurer’s risk exposure. However, they considered a simple point- 
to-point guarantee where surplus (if any) is credited at maturity only. The financial 
risks of cliquet-style guarantee products have later been investigated, e.g., by Grosen 
and Jorgensen [17]. They introduce the “average interest principle”, where the insurer 
aims to smooth future bonus distributions by using a bonus reserve as an additional 
buffer besides the policy reserve (the client’s account). Besides valuing the contract 
they also calculate default probabilities (however, under the risk-neutral probability 
measure Q). Grosen et al. [19] extend the model of Grosen and Jorgensen [17], and 
introduce mortality risk. Grosen and Jorgensen [18] modify the model used by Briys 
and de Varenne [8] by incorporating a regulatory constraint for the insurer’s assets 
and analyzing the consequences for the insurer’s risk policy. Mitersen and Persson 
[23] analyze a different cliquet-style guarantee framework with the so-called terminal 
bonuses, whereas Bauer et al. [4] specifically investigate the valuation of participating 
contracts under the German regulatory framework. 

While all this work focuses on the risk-neutral valuation of life insurance contracts 
(sometimes referred to as “financial approach”), Kling et al. [20, 21] concentrate 
on the risk a contract imposes on the insurer (sometimes referred to as “actuar- 
ial approach”) by means of shortfall probabilities under the real-world probability 
measure P. 

Barbarin and Devolder [3] introduce a methodology that allows for combining 
the financial and actuarial approach. They consider a contract similar to Briys and 
de Varenne [8] with a point-to-point guarantee and terminal surplus participation. 
To integrate both approaches, they use a two-step method of pricing life insurance 
contracts: First, they determine a guaranteed interest rate such that certain regulatory 
requirements are satisfied, using value at risk and expected shortfall risk measures. 
Second, to obtain fair contracts, they use risk-neutral valuation and adjust the par- 
ticipation in terminal surplus accordingly. Based on this methodology, Gatzert and 
Kling [14] investigate parameter combinations that yield fair contracts and analyze 
the risk implied by fair contracts for various contract designs. Gatzert [13] extends 
this approach by introducing the concept of “risk pricing” using the “fair value of 
default” to determine contracts with the same risk exposure. Graf et al. [16] (also 
building on Barbarin and Devolder [3]) derive the risk minimizing asset allocation 
for fair contracts using different risk measures like the shortfall probability or the 
relative expected shortfall. 

Under risk-based solvency frameworks such as Solvency II or the Swiss Solvency 
Test (SST), the risk analysis of interest rate guarantees becomes even more impor- 
tant. Under these frameworks, capital requirement is derived from a market consistent 
valuation considering the insurer’s risk. This risk is particularly high for long term 
contracts with a year-to-year guarantee based on a fixed (i.e., not path dependent) 
guaranteed interest rate. Measuring and analyzing the financial risk in relation to the 
required capital, and analyzing new risk figures such as the Time Value of Options 


Participating Life Insurance Contracts under Risk Based Solvency Frameworks . . . 


187 


and Guarantees (TVOG) is a relatively new aspect, which gains importance with 
new solvability frameworks, e.g., the largest German insurance company (Allianz) 
announced in a press conference on June 25, 2013 1 the introduction of a new partici- 
pating life insurance product that (among other features) fundamentally modifies the 
type of interest rate guarantee (similar to what we propose in the remainder of this 
paper). It was stressed that the TVOG is significantly reduced for the new product. 
Also, it was mentioned that the increase of the TVOG resulting from an interest rate 
shock (i.e., the solvency capital requirement for interest rate risk) is reduced by 80 % 
when compared to the previous product. This is consistent with the findings of this 
paper. 

The aim of this paper is a comprehensive risk analysis of different contract designs 
for participating life insurance products. Currently, there is an ongoing discussion, 
whether and how models assessing the insurer’s risk should be modified to reduce the 
capital requirements (e.g., by applying an “ultimate forward rate” set by the regula- 
tor). We will in contrast analyze how (for a given model) the insurer’s risk, and hence 
capital requirement can be influenced by product design. Since traditional cliquet- 
style participating life insurance products lead to very high capital requirements, we 
will introduce alternative contract designs with modified types of guarantees, which 
reduce the insurer’s risk and profit volatility, and therefore also the capital require- 
ments under risk-based solvency frameworks. In order to compare different product 
designs from an insurer’s perspective, we develop and discuss the concept of Capital 
Efficiency, which relates profit to capital requirements. 2 We identify the key drivers 
of Capital Efficiency, which are then used in our analyses to assess different product 
designs. 

The remainder of this paper is structured as follows: 

In Sect. 2, we present three considered contract designs that all come with the 
same level of guaranteed maturity benefit but with different types of guarantee: 

• Traditional product: a traditional contract with a cliquet-style guarantee based on 
a guaranteed interest rate > 0. 

• Alternative product 1 : a contract with the same guaranteed maturity benefit, which 
is, however, valid only at maturity; additionally, there is a 0 % year-to-year guar- 
antee on the account value meaning that the account value cannot decrease from 
one year to the next. 

• Alternative product 2: a contract with the same guaranteed maturity benefit that is, 
however, valid only at maturity; there is no year-to-year guarantee on the account 
value meaning that the account value may decrease in some years. 


1 Cf. [1], particularly slide D24. 

2 Of course, there already exist other well-established measures linking profit to required capital, 
such as the return on risk-adjusted capital (RORAC). However, they may not be suitable to assess 
products with long-term guarantees since they consider the required capital on a one-year basis only. 
To the best of our knowledge there is no common measure similar to what we define as Capital 
Efficiency that relates the profitability of an insurance contract to the risk it generates, and hence 
capital it requires over the whole contract term. 
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On top of the different types of guarantees, all three products include a surplus 
participation depending on the insurer’s return on assets. Our model is based on 
the surplus participation requirements given by German regulation. That means in 
particular that each year at least 90 % of the (book value) investment return has to be 
distributed to the policyholders. 

To illustrate the mechanics, we will first analyze the different products under 
different deterministic scenarios. This shows the differences in product design and 
how they affect the insurer’s risk. 

In Sect. 3, we introduce our stochastic model, which is based on a standard fi- 
nancial market model: The stock return and short rate processes are modeled using 
a correlated Black-Scholes and Vasicek model. 3 We then describe how the evolu- 
tion of the insurance portfolio and the insurer’s balance sheet are simulated in our 
asset-liability-model. The considered asset allocation consists of bonds with differ- 
ent maturities and stocks. The model also incorporates management rules as well as 
typical intertemporal risk sharing mechanisms (e.g., building and dissolving unreal- 
ized gains and losses), which are an integral part of participating contracts in many 
countries and should therefore not be neglected. 

Furthermore, we introduce a measure for Capital Efficiency based on currently 
discussed solvency regulations such as the Solvency II framework. We also propose 
a more tractable measure for an assessment of the key drivers of Capital Efficiency. 

In Sect. 4, we present the numerical results. We show that the alternative products 
are significantly more capital efficient: financial risk, and therefore also capital re- 
quirement is significantly reduced, although in most scenarios all products provide 
the same maturity benefit to the policyholder. 4 We observe that the typical “asymme- 
try”, i.e., particularly the heavy left tail of the insurer’s profit distribution is reduced 
by the modified products. This leads to a significant reduction of both, the TVOG 
and the solvency capital requirement for interest rate risk. 

Section 5 concludes and provides an outlook for further research. 


2 Considered Products 


In this section, we describe the three different considered contract designs. Note that 
for the sake of simplicity, we assume that in case of death in year t, always only the 
current account value AV t (defined below) is paid at the end of year t. This allows 
us to ignore mortality for the calculation of premiums and actuarial reserves. 


3 The correlated Black-Scholes and Vasicek model is applied in Zaglauer and Bauer [29] and Bauer 
et al. [5] in a similar way. 

4 Note: In scenarios where the products’ maturity benefits do differ, the difference is limited since 
the guaranteed maturity benefit (which is the same for all three products) is a lower bound for the 
maturity benefit. 
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2.1 The Traditional Product 


First, we consider a traditional participating life insurance contract with a cliquet- 
style guarantee. It provides a guaranteed benefit G at maturity T based on annual 
premium payments P . The pricing is based on a constant guaranteed interest rate i 
and reflects annual charges c t . The actuarial principle of equivalence 5 yields 


7-1 

-c t )-(l+i) T -‘ =G. 

t = 0 


( 1 ) 


During the lifetime of the contract, the insurer has to build up sufficient (prospective) 
actuarial reserves AR t for the guaranteed benefit based on the same constant interest 
rate i : 


AR t = G • 



7-1 


X ( p ~ • 

k=t 



( 2 ) 


The development of the actuarial reserves is then given by: 


ARt — (ARt — i + P — Ct— i) • (1 + /)• 

Traditional participating life insurance contracts typically include an annual sur- 
plus participation that depends on the performance of the insurer’s assets. For exam- 
ple, German regulation requires that at least a “minimum participation” of p = 90 % 
of the (local GAAP book value) earnings on the insurer’s assets has to be credited 
to the policyholders’ accounts. For the traditional product, any surplus assigned to 
a contract immediately increases the guaranteed benefit based on the same interest 
rate i. More precisely, the surplus s t is credited to a bonus reserve account BR t 
(where BRq = 0) and the interest rate i will also apply each year on the bonus 
reserve: 

BRt = B R t - 1 • (1 + i) + St . 


The client’s account value AV t consists of the sum of the actuarial reserve AR t and 
the bonus reserve BR t \ the maturity benefit is equal to AVj. 

As a consequence, each year at least the rate i has to be credited to the contracts. 
The resulting optionality is often referred to as asymmetry: If the asset return is above 
/, a large part (e.g., p = 90 %) of the return is credited to the client as a surplus and 
the shareholders receive only a small portion (e.g., 1 — p = 10%) of the return. 
If, on the other hand, the asset returns are below i, then 100% of the shortfall has 
to be compensated by the shareholder. Additionally, if the insurer distributes a high 
surplus, this increases the insurer’s future risk since the rate i has to be credited also 
to this surplus amount in subsequent years. Such products constitute a significant 


5 For the equivalence principle, see e.g., Saxer [25], Wolthuis [28]. 
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Fig. 1 Two illustrative deterministic scenarios for the traditional product: asset returns and yield 
distribution 


financial risk to the insurance company, in particular in a framework of low interest 
rates and volatile capital markets. 6 

The mechanics of this year-to-year guarantee are illustrated in Fig. 1 for two 
illustrative deterministic scenarios. We consider a traditional policy with term to 
maturity T — 20 years and a guaranteed benefit of G = €20,000. Following the 
current situation in Germany, we let i = 1.75 % and assume a surplus participation 
rate of p = 90 % on the asset returns. 

The first scenario is not critical for the insurer. The asset return (which is here 
arbitrarily assumed for illustrative purposes) starts at 3 %, then over time drops to 2 % 
and increases back to 3 % where the x axis shows the policy year. The chart shows 
this asset return, the “client’s yield” (i.e., the interest credited to the client’s account 
including surplus), the “required yield” (which is defined as the minimum rate that 
has to be credited to the client’s account), and the insurer’s yield (which is the portion 
of the surplus that goes to the shareholder). Obviously, in this simple example, the 
client’s yield always amounts to 90% of the asset return and the insurer’s yield 
always amounts to 10% of the asset return. By definition, for this contract design, 
the required yield is constant and always coincides with i = 1.75 %. 

In the second scenario, we let the asset return drop all the way down to 1 %. 
Whenever 90 % of the asset return would be less than the required yield, the insurer 
has to credit the required yield to the account value. This happens at the shareholder’s 
expense, i.e., the insurer’s yield is reduced and even becomes negative. This means 
that a shortfall occurs and the insurer has to provide additional funds. 

It is worthwhile noting that in this traditional product design, the interest rate i 
plays three different roles: 

• pricing interest rate i p used for determining the ratio between the premium and 
the guaranteed maturity benefit, 

• reserving interest rate i r , i.e., technical interest rate used for the calculation of the 
prospective actuarial reserves, 

• year-to-year minimum guaranteed interest rate i g , i.e., a minimum return on the 
account value. 


6 This was also a key result of the QIS5 final report preparing for Solvency II, cf. [2, 11]. 
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2.2 Alternative Products 

We will now introduce two alternative product designs, which are based on the idea 
to allow different values for the pricing rate, the reserving rate and the year-to- 
year minimum guaranteed interest rate on the account value. So Formulas 1 and 2 
translate to the following formulae for the relation between the annual premium, the 
guaranteed benefit and the actuarial reserves: 


t - l 

^(P -Ct)- +ip) T ~ t 


t = 0 


= G 


AR t = G • 



T - 1 


X (P ~ ck) ■ 

k=t 



Note, that in the first years of the contract, negative values for AR t are possible in 
case of i p < i r , which implies a “financial buffer” at the beginning of the contract. 
The year-to-year minimum guaranteed interest rate i g is not relevant for the formulae 
above, but it is simply a restriction for the development of the client’s account, i.e., 

AV, > (AV,_i + P- c,-\) ■ (1 + i g ) , 


where AV o = max {A7?o> 0} is the initial account value of the contract. 

The crucial difference between such new participating products and traditional 
participating products is that the guaranteed maturity benefit is not explicitly in- 
creased during the lifetime of the contract (but, of course, an increase in the account 
value combined with the year-to-year minimum guaranteed interest rate can implic- 
itly increase the maturity guarantee). 

In this setting, the prospective reserve AR t is only a minimum reserve for the 
guaranteed maturity benefit: The insurer has to make sure that the account value 
does not fall below this minimum reserve. This results in a “required yield” explained 
below. Under “normal” circumstances the account value (which is also the surrender 
value) exceeds the minimum reserve. Therefore, the technical reserve (under local 
GAAP), which may not be below the surrender value, coincides with the account 
value. 

The required yield on the account value in year t is equal to 


Zt = max 


max {A7? r , 0} 
AVf-i + P — Ct — i 



( 3 ) 


The left part of (3) assures that the account value is nonnegative and never lower 
than the actuarial reserve. The required yield decreases if the bonus reserve (which 
is included in A V t ~\) increases. 
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The surplus participation rules remain unchanged: the policyholder’s share p 
(e.g., 90%) of the asset return is credited to the policyholders (but not less than zt). 
Hence, as long as the policyholder’s share is always above the technical interest rate 
used in the traditional product, there is no difference between the traditional and the 
alternative product designs. 

Obviously, only combinations fulfilling ig < ip < i r result in suitable products: 
If the first inequality is violated, then the year-to-year minimum guaranteed interest 
rate results in a higher (implicitly) guaranteed maturity benefit than the (explicit) 
guarantee resulting from the pricing rate. If the second inequality is violated then at 
t = 0, additional reserves (exceeding the first premium) are required. 

In what follows, we will consider two concrete alternative contract designs. Ob- 
viously, the choice of i g fundamentally changes the mechanics of the guarantee em- 
bedded in the product (or the “type” of guarantee), whereas the choice of i p changes 
the level of the guarantee. Since the focus of this paper is on the effect of the different 
guarantee mechanisms, we use a pricing rate that coincides with the technical rate of 
the traditional product. Hence, the guaranteed maturity benefit remains unchanged. 
Since the legally prescribed maximum value for the reserving rate also coincides 
with the technical rate of the traditional product, we get i p = i r = 1.75 % for both 
considered alternative designs. 

In our alternative product 1, we set i g = 0 % (0 % year-to-year guarantee) and for 
alternative 2 we set i g = —100 % (no year-to-year guarantee). In order to illustrate 
the mechanics of the alternative products, Figs. 2 and 3 show the two scenarios 
from Fig. 1 for both alternative products. In the first scenario (shown on the left), 
the required yield it on the account value gradually decreases for both alternative 
contract designs since the bonus reserve acts as some kind of buffer (as described 
above). For alternative 1, the required yield can of course not fall below i g = 0%, 
while for the alternative 2 it even becomes negative after some years. 

The adverse scenario on the right shows that the required yield rises again after 
years with low asset returns since the buffer is reduced. However, contrary to the 
traditional product, the asset return stays above the required level and no shortfall 
occurs. 


Nenfritjcal scenario for alternative 1 


Adverse scenario for alternative 1 
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Fig. 2 Two illustrative deterministic scenarios for alternative 1 product: asset returns and yield 
distribution 
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Fig. 3 Two illustrative deterministic scenarios for alternative 2 product: asset returns and yield 
distribution 


From a policyholder’s perspective, both alternative contract designs provide the 
same maturity benefit as the traditional contract design in the first scenario since the 
client’s yield is always above 1.75 %. In the second scenario, however, the maturity 
benefit is slightly lower for both alternative contract designs since (part of) the buffer 
built up in years 1 to 8 can be used to avoid a shortfall. In this scenario, the two 
alternative products coincide, since the client’s yield is always positive. 

Even if scenarios where the products differ appear (or are) unlikely, the mod- 
ification has a significant impact on the insurer’s solvency requirements since the 
financial risks particularly in adverse scenarios are a key driver for the solvency cap- 
ital requirement. This will be considered in a stochastic framework in the following 
sections. 


3 Stochastic Modeling and Analyzed Key Figures 

Since surplus participation is typically based on local GAAP book values (in particu- 
lar in Continental Europe), we use a stochastic balance sheet and cash flow projection 
model for the analysis of the product designs presented in the previous section. The 
model includes management rules concerning asset allocation, reinvestment strat- 
egy, handling of unrealized gains and losses and surplus distribution. Since the focus 
of the paper is on the valuation of future profits and capital requirements we will 
introduce the model under a risk-neutral measure. Similar models have been used 
(also in a real-world framework) in Kling et al. [20, 21] and Graf et al. [16]. 


3.1 The Financial Market Model 

We assume that the insurer’s assets are invested in coupon bonds and stocks. We 
treat both assets as risky assets in a risk-neutral, frictionless and continuous financial 
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market. Additionally, cash flows during the year are invested in a riskless bank 
account (until assets are reallocated). We let the short rate process r t follow a Vasicek 7 
model, and the stock price S t follow a geometric Brownian motion: 

d r t = k (0 — r t ) d t + a r dW^ and 
^ = r,dt + pos&W f (1) + Vl - p^crsdW^, 

where and each denote a Wiener process on some probability space 
(Q, IF, Q) with a risk-neutral measure Q and the natural filtration F = = 

, s < The parameters k, 0 , cr r , as and p are deterministic and 
constant. For the purpose of performing Monte Carlo simulations, the stochastic 
differential equations can be solved to 


S, = S t -1 • exp | j r u du - ^ + j pa s dW^ x) + J 1 - p 2 a s dW ( u 2) 

\f-l t - 1 t-l 

t 

r t = q~ k • r t - 1 + 0 (l — e -/c ) + J o r • e^^^dW^, 


and 


f-1 


where So = 1 and the initial short rate ro is a deterministic parameter. Then, the bank 
account is given by B t = exp ^ Jq r w dz^ . It can be shown that the four (stochastic) 

integrals in the formulae above follow a joint normal distribution. 8 Monte Carlo 
paths are calculated using random realizations of this multidimensional distribution. 
The discretely compounded yield curve at time t is then given by 9 


rt(s) = 


exp 


1 / 1 — e~ KS / 1 — q~ ks \ / cr 2 \ / 1 — q~ ks \ 2 cr 2 \" 

k rr+ V * )'V 2^) + ( K ) 4/c J 


- 1 


for any time t and term s > 0. Based on the yield curves, we calculate par yields that 
determine the coupon rates of the considered coupon bonds. 


7 Cf. [27]. 

8 Cf. Zaglauer and Bauer [29]. A comprehensive explanation of this property is included in 
Bergmann [6]. 

9 See Seyboth [26] as well as Branger and Schlag [7] . 
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Table 1 Balance sheet at 
time t 


Assets 

Liabilities 

B V, s 

X, 

BV t B 

AV t 


3.2 The Asset-Liability Model 


The insurer’s simplified balance sheet at time t is given by Table 1 . Since our analysis 
is performed for a specific portfolio of insurance contracts on a stand-alone basis, 
there is no explicit allowance for shareholders’ equity or other reserves on the liability 
side. Rather, X t denotes the shareholders’ profit or loss in year t, with corresponding 
cash flow at the beginning of the next year. Together with A V t as defined in Sect. 2, 
this constitutes the liability side of our balance sheet. 

In our projection of the assets and insurance contracts, incoming cash flows (pre- 
mium payments at the beginning of the year, coupon payments and repayment of 
nominal at the end of the year) and outgoing cash flows (expenses at the beginning of 
the year and benefit payments at the end of the year) occur. In each simulation path, 
cash flows occurring at the beginning of the year are invested in a bank account. At 
the end of the year, the market values of the stocks and coupon bonds are derived and 
the asset allocation is readjusted according to a rebalancing strategy with a constant 
stock ratio q based on market values. Conversely, (1 — q) is invested in bonds and 
any money on the bank account is withdrawn and invested in the portfolio consisting 
of stocks and bonds. 

If additional bonds need to be bought in the process of rebalancing, the corre- 
sponding amount is invested in coupon bonds yielding at par with term M. However, 
toward the end of the projection, when the insurance contracts’ remaining term is 
less than M years, we invest in bonds with a term that coincides with the longest 
remaining term of the insurance contracts. If bonds need to be sold, they are sold 
proportionally to the market values of the different bonds in the existing portfolio. 

With respect to accounting, we use book- value accounting rules following German 
GAAP, which may result in unrealized gains or losses (UGL): Coupon bonds are 
considered as held to maturity and their book value B V t B is always given by their 
nominal amounts (irrespective if the market value is higher or lower). In contrast, 
for the book value of the stocks BV t s , the insurer has some discretion. 

Of course, interest rate movements as well as the rebalancing will cause fluc- 
tuations with respect to the UGL of bonds. Also, the rebalancing may lead to the 
realization of UGL of stocks. In addition, we assume an additional management rule 
with respect to UGL of stocks: We assume that the insurer wants to create rather 
stable book value returns (and hence surplus distributions) in order to signal stability 
to the market. We, therefore, assume that a ratio J pos of the UGL of stocks is realized 
annually if unrealized gains exist and a ratio d neg of the UGL is realized annually 
if unrealized losses exist. In particular, d neg = 100 % has to be chosen in a legal 
framework where unrealized losses on stocks are not possible. 
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Based on this model, the total asset return on a book value basis can be calculated 
in each simulation path each year as the sum of coupon payments from bonds, 
interest payments on the bank account, and the realization of UGL. The split between 
policyholders and shareholders is driven by the minimum participation parameter p 
explained in Sect. 2. If the cumulative required yield on the account values of all 
policyholders is larger than this share, there is no surplus for the policyholders, 
and exactly the respective required yield Zt is credited to every account. Otherwise, 
surplus is credited, which amounts to the difference between the policyholders’ share 
of the asset return and the cumulative required yield. Following the typical practice, 
e.g., in Germany, we assume that this surplus is distributed among the policyholders 
such that all policyholders receive the same client’s yield (defined by the required 
yield plus surplus rate), if possible. To achieve that, we apply an algorithm that 
sorts the accounts by required yield, i.e., ^z^\ . . . , zf^ , k e N in ascending order. 
First, all contracts receive their respective required yield. Then, the available surplus 
is distributed: Starting with the contract(s) with the lowest required yield z\ l \ the 
algorithm distributes the available surplus to all these contracts until the gap to the 
next required yield z) is filled. Then, all the contracts with a required yield lower 
or equal to z t receive an equal amount of (relative) surplus until the gap to z t is 
filled, etc. This is continued until the entire surplus is distributed. The result is that 
all contracts receive the same client’s yield if this unique client’s yield exceeds the 
required yield of all contracts. Otherwise, there exists a threshold z* such that all 
contracts with a required yield above z* receive exactly their required yield (and no 
surplus) and all contracts with a required yield below z* receive z* (i.e., they receive 
some surplus). 

From this, the insurer’s profit X t results as the difference between the total asset 
return and the amount credited to all policyholder accounts. If the profit is negative, 
a shortfall has occurred, which we assume to be compensated by a corresponding 
capital inflow (e.g., from the insurer’s shareholders) at the beginning of the next 
year. 10 Balance sheet and cash flows are projected over r years until all policies that 
are in force at time zero have matured. 


3.3 Key Drivers for Capital Efficiency 


The term Capital Efficiency is frequently used in an intuitive sense, in particular 
among practitioners, to describe the feasibility, profitability, capital requirement, 
and riskiness of products under risk-based solvency frameworks. However, to the 
best of our knowledge, no formal definition of this term exists. Nevertheless, it 
seems obvious that capital requirement alone is not a suitable figure for managing a 


10 We do not consider the shareholders’ default put option resulting from their limited liability, 
which is in line with both, Solvency II valuation standards and the Market Consistent Embedded 
Value framework (MCEV), cf. e.g., [5] or [10], Sect. 5.3.4. 
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product portfolio from an insurer’s perspective. Rather, capital requirement and the 
resulting cost of capital should be considered in relation to profitability. 

Therefore, a suitable measure of Capital Efficiency could be some ratio of prof- 
itability and capital requirement, e.g., based on the distribution of the random variable 



r 


Z 

t= 1 


RC t - V CoC t 

B t 


( 4 ) 


The numerator represents the present value of the insurer’s future profits, whereas the 
denominator is equal to the present value of future cost of capital: RC t denotes the 
required capital at time t under some risk-based solvency framework, i.e., the amount 
of shareholders’ equity needed to support the business in force. The cost of capital 
is derived by applying the cost of capital rate CoC t for year t on the required capital 
at the beginning of this year. 11 In practical applications, however, the distribution of 
this ratio might not be easy to calculate. Therefore, moments of this distribution, a 
separate analysis of (moments of) the numerator and the denominator or even just 
an analysis of key drivers for that ratio could create some insight. 

In this spirit, we will use a Monte Carlo framework to calculate the following key 
figures using the model described above: 

A typical market consistent measure for the insurer’s profitability is the expected 
present value of future profits (PVFP), 12 which corresponds to the expected value of 
the numerator in (4). The PVFP is estimated based on Monte Carlo simulations: 


N r (n) , N 

pvfp = — y y -y = — y pvfp ( h) , 

N d(») N ' 

7=1 t= 1 


n = 1 


where N is the number of scenarios, X} denotes the insurer’s profit/loss in year t 

(n) 

in scenario n, Bj is the value of the bank account after t years in scenario n , and 
hence PVFP^ is the present value of future profits in scenario n. 

In addition, the degree of asymmetry of the shareholder’s cash flows can be char- 
acterized by the distribution of PVFP^ over all scenarios 13 and by the time value of 
options and guarantees (TVOG). Under the MCEV framework, 14 the latter is defined 
by 

TVOG = PVFP C £ - PVFP 


1 1 This approach is similar to the calculation of the cost of residual nonhedgeable risk as introduced in 
the MCEV Principles in [9], although RC t reflects the total capital requirement including hedgeable 
risks. 

12 The concept of PVFP is introduced as part of the MCEV Principles in [9]. 

13 Note that this is a distribution under the risk-neutral measure and has to be interpreted carefully. 
However, it can be useful for explaining differences between products regarding PVFP and TVOG. 

14 Cf. [9]. 
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Table 2 Product parameters I 



Traditional Product (%) 

Alternative 1 (%) 

Alternative 2 (%) 

Ip, If 

1.75 

1.75 

1.75 

h 

1.75 

0 

-100 


X (CE) 

where PVFPc# = Y^ t =i \ce) * s the present value of future profits in the so-called 

“certainty equivalent” (CE) scenario. This deterministic scenario reflects the expected 
development of the capital market under the risk-neutral measure. It can be derived 
from the initial yield curve ro(s) based on the assumption that all assets earn the 
forward rate implied by the initial yield curve. 15 The TVOG is also used as an 
indicator for capital requirement under risk-based solvency frameworks. 

Comparing the PVFP for two different interest rate levels — one that we call ba- 
sic level and a significantly lower one that we call stress level — provides another 
important key figure for interest rate risk and capital requirements. In the standard 
formula 16 of the Solvency II framework 

APVFP = PVFP(basic) - PVFP(stress) 

determines the contribution of the respective product to the solvency capital require- 
ment for interest rate risk (SCR i nt ). Therefore, we also focus on this figure which 
primarily drives the denominator in (4). 


4 Results 

4.1 Assumptions 

The stochastic valuation model described in the previous section is applied to a 
portfolio of participating contracts. For simplicity, we assume that all policyholders 
are 40 years old at inception of the contract and mortality is based on the German 
standard mortality table (DAV 2008 T). We do not consider surrender. Furthermore, 
we assume annual charges c t that are typical in the German market consisting of 
annual administration charges /3 • P throughout the contract’s lifetime, and acquisition 
charges a • T • P, which are equally distributed over the first 5 years of the contract. 
Hence, c t = /3 • P + a lfe{0,...,4} • Furthermore, we assume that expenses coincide 
with the charges. Product parameters are given in Tables 2 and 3. 

Stochastic projections are performed for a portfolio that was built up in the past 
20 years (i.e., before t — 0) based on 1,000 new policies per year. Hence, we have a 


15 Cf. Oechslin et al. [24]. 

16 A description of the current version of the standard formula can be found in [12]. 
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Table 3 Product parameters II 


G(€) 

T (years) 

P(€) 


Of (%) 

20,000 

20 

896.89 

3 

4 


portfolio at the beginning of the projections with remaining time to maturity between 
1 year and 19 years (i.e., r = 19 years). 17 For each contract, the account value at 
t — 0 is derived from a projection in a deterministic scenario. In this deterministic 
scenario, we use a flat yield curve of 3.0% (consistent with the mean reversion 
parameter 0 of the stochastic model after t = 0), and parameters for management 
rules described below. In line with the valuation approach under Solvency II and 
MCEV, we do not consider new business. 

The book value of the asset portfolio at t = 0 coincides with the book value of 
liabilities. We assume a stock ratio of q = 5 % with unrealized gains on stocks at 
t = 0 equal to 10 % of the book value of stocks. The coupon bond portfolio consists 
of bonds with a uniform coupon of 3.0 % where the time to maturity is equally split 
between 1 year and M = 10 years. 

Capital market parameters for the basic and stress projections are shown in Table 4. 
The parameters K,cr r ,os and p are directly adopted from Graf et al. [16]. The pa- 
rameters 6 and ro are chosen such that they are more in line with the current low 
interest rate level. The capital market stress corresponds to an immediate drop of 
interest rates by 100 basis points. 

The parameters for the management rules are given in Table 5 and are consistent 
with current regulation and practice in the German insurance market. 

For all projections, the number of scenarios is N = 5,000. Further analyses 
showed that this allows for a sufficiently precise estimation of the relevant figures. 18 


Table 4 Capital market parameters 



ro (%) 

0(%) 

K (%) 

o> (%) 

tfs (%) 

p(%) 

Basic 

2.5 

3.0 

30.0 

2.0 

20.0 

15.0 

Stress 

1.5 

2.0 






17 Note that due to mortality before t = 0, the number of contracts for the different remaining times 
to maturity is not the same. 

18 In order to reduce variance in the sample an antithetic path selection of the random numbers is 
applied, cf. e.g., Glasserman [15]. 
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Table 5 Parameters for management rules 


<?(%) 

M (years) 

^pos (%) 

^neg(%) 

p(%) 

5 

10 

20 

100 

90 


4.2 Comparison of Product Designs 

In Table 6, the PVFP and the TVOG for the base case are compared for the three 
products. All results are displayed as a percentage of the present value of future 
premium income from the portfolio. For alternative 1, the PVFP increases from 
3.63 to 4.24 %, i.e., by 0.61 percentage points (pp), compared to the traditional 
contract design (which corresponds to a 17 % increase of profitability). This means 
that this product with a “maturity only” guarantee and an additional guarantee that 
the account value will not decrease is, as expected, more profitable than the product 
with a traditional year-to-year (cliquet- style) guarantee. This difference is mainly 
caused by the different degree of asymmetry of the shareholders’ cash flows which is 
characterized by the TVOG. Since PVFPcz amounts to 4.26 % for all products in the 
base case, the difference of TVOG between the traditional product and alternative 
1 is also 0.61pp. This corresponds to a TVOG reduction of more than 90% for 
alternative 1, which shows that the risk resulting from the interest rate guarantee is 
much lower for the modified product. 

Compared to this, the differences between alternative 1 and alternative 2 are 
almost negligible. The additional increase of the PVFP is only 0.01 pp, which is due 
to a slightly lower TVOG compared to alternative 1. This shows that the fact that 
the account value may decrease in some years in alternative 2 does not provide a 
material additional risk reduction. 

Additional insights can be obtained by analyzing the distribution of PVFP^ (see 
Fig. 4) 19 : For the traditional contract design, the distribution is highly asymmetric 
with a strong left tail and a significant risk of negative shareholder cash flows (on a 
present value basis). In contrast, both alternative contract designs exhibit an almost 
symmetric distribution of shareholder cash flows which explains the low TVOG. 
Hence, the new products result in a significantly more stable profit perspective for 
the shareholders, while for the traditional product the shareholder is exposed to 
significantly higher shortfall risk. 

Ultimately, the results described above can be traced back to differences in the 
required yield. While for the traditional product, by definition, the required yield 
always amounts to 1.75 %, it is equal to 0% in most scenarios for the alternative 1 
product. Only in the most adverse scenarios, the required yield rises toward 1 .75 %. 20 
For the alternative 2 product, it is even frequently negative. 


19 Cf. Footnote 13. 

20 Note that here, the required yield in the first projection year reflects the financial buffer available 
for the considered portfolio of existing contracts at t = 0. This is different from the illustrations in 
Sect. 2, which consider individual contracts from inception to maturity. 
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Distribution of PVFP(n) in base case 
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PVFP{n) in % of present value of premium income 
Fig. 4 Histogram of PVFP^ in base case 


Table 6 PVFP and TVOG for base case (as percentage of the present value of premium income) 



Traditional product (%) 

Alternative 1 (%) 

Alternative 2 (%) 

PVFP 

3.63 

4.24 

4.25 

TVOG 

0.63 

0.02 

0.01 


Apart from the higher profitability, the alternative contract designs also result 
in a lower capital requirement for interest rate risk. This is illustrated in Table 7, 
which displays the PVFP under the interest rate stress and the difference to the basic 
level. Compared to the basic level, the PVFP for the traditional product decreases 
by 75 %, which corresponds to an SCR;^ of 2.73 % of the present value of future 
premium income. In contrast, the PVFP decreases by only around 40% for the 
alternative contract designs and thus the capital requirement is only 1.66 and 1.65 %, 
respectively. 

We have seen that a change in the type of guarantee results in a significant increase 
of the PVFP. Further analyses show that a traditional product with guaranteed interest 
rate i = 0.9 % instead of 1.75 % would have the same PVFP (i.e., 4.25 %) as the 
alternative contract designs with i p = 1.75 %. Hence, although changing only the 
type of guarantee and leaving the level of guarantee intact might be perceived as a 
rather small product modification by the policyholder, it has the same effect on the 
insurer’s profitability as reducing the level of guarantee by a significant amount. 

Furthermore, our results indicate that even in an adverse capital market situation 
the alternative product designs may still provide an acceptable level of profitability: 
The profitability of the modified products if interest rates were 50 basis points lower 
roughly coincides with the profitability of the traditional product in the base case. 
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Table 7 PVFP for stress level and PVFP difference between basic and stress level 



Traditional product (%) 

Alternative 1 (%) 

Alternative 2 (%) 

PVFP(basic) 

3.63 

4.24 

4.25 

PVFP(stress) 

0.90 

2.58 

2.60 

A PVFP 

2.73 

1.66 

1.65 


4.3 Sensitivity Analyses 

In order to assess the robustness of the results presented in the previous section, we 
investigate three different sensitivities: 

1. Interest rate sensitivity: The long-term average 6 and initial rate ro in Table 4 are 
replaced by 0 = 2.0 %, ro = 1 .5 % for the basic level, and 0 = 1 .0 %, ro = 0.5 % 
for the stress level. 

2. Stock ratio sensitivity: The stock ratio is set to q = 10 % instead of 5 %. 

3. Initial buffer sensitivity: The initial bonus reserve BR t = AV t — AR t is doubled 
for all contracts. 21 

The results are given in Table 8. 

Interest rate sensitivity If the assumed basic interest rate level is lowered by 
100 basis points, the PVFP decreases and the TVOG increases significantly for all 
products. In particular, the alternative contract designs now also exhibit a significant 
TVOG. This shows that in an adverse capital market situation, also the guaran- 
tees embedded in the alternative contract designs can lead to a significant risk for 
the shareholder and an asymmetric distribution of profits as illustrated in Fig. 5. 
Nevertheless, the alternative contract designs are still much more profitable and less 
volatile than the traditional contract design and the changes in PVFP/TVOG are 
much less pronounced than for the traditional product: while the TVOG rises from 
0.63 to 2.13%, i.e., by 1.50pp for the traditional product, it rises by only 0.76pp 
(from 0.02 to 0.78 %) for alternative 1. 

As expected, an additional interest rate stress now results in a larger SCR \ nt . For 
all product designs, the PVFP after stress is negative and the capital requirement 
increases significantly. However, as in the base case (cf. Table 7), the SCR;^ for 
the traditional product is more than one percentage point larger than for the new 
products. 

Stock ratio sensitivity The stock ratio sensitivity also leads to a decrease of PVFP 
and an increase of TVOG for all products. Again, the effect on the PVFP of the 
traditional product is much stronger: The profit is about cut in half (from 3.63 to 
1.80 %), while for the alternative 1 product the reduction is much smaller (from 4.24 
to 3.83 %), and even smaller for alternative 2 (from 4.25 to 3.99 %). It is noteworthy 
that with a larger stock ratio of q = 10 % the difference between the two alternative 


21 The initial book and market values of the assets are increased proportionally to cover this addi- 
tional reserve. 
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Table 8 PVFP, TVOG, PVFP under interest rate stress and Z\PVFP for base case and all 
sensitivities 


Base case 

Traditional product (%) 

Alternative 1 (%) 

Alternative 2 (%) 

PVFP 

3.63 

4.24 

4.25 

TVOG 

0.63 

0.02 

0.01 

PVFP(stress) 

0.90 

2.58 

2.60 

Z\PVFP 

2.73 

1.66 

1.65 

Interest rate sensitivity 




PVFP 

0.90 

2.58 

2.60 

TVOG 

2.13 

0.78 

0.76 

PVFP(stress) 

-4.66 

-1.81 

-1.76 

Z\PVFP 

5.56 

4.39 

4.36 

Stock ratio sensitivity 




PVFP 

1.80 

3.83 

3.99 

TVOG 

2.45 

0.43 

0.26 

PVFP(stress) 

-1.43 

1.65 

1.92 

Z\PVFP 

3.23 

2.18 

2.07 

Initial buffer sensitivity 




PVFP 

3.74 

4.39 

4.39 

TVOG 

0.64 

<0.01 

<0.01 

PVFP(stress) 

1.02 

2.87 

2.91 

Z\PVFP 

2.72 

1.52 

1.48 


products becomes more pronounced, which is reflected by the differences of the 
TVOG. Alternative 2 has a lower shortfall risk than alternative 1 since the account 
value may decrease in some years as long as the account value does not fall below 
the minimum reserve for the maturity guarantee. Hence, we can conclude that the 
guarantee that the account value may not decrease becomes more risky if asset returns 
exhibit a higher volatility. 

The results for the stressed PVFPs under the stock ratio sensitivity are in line with 
these results: First, the traditional product requires even more solvency capital: The 
SCR;^ is half a percentage point larger than in the base case (3.23 % compared to 
2.73 %), and it is also more than one percentage point larger than for the alternative 
products with 10% stocks (2.18/2.07%). Second, the interest rate stress shows a 
more substantial difference between the two different alternative products. While 
the difference of the SCR;^ between alternative 1 and 2 was 0.01 % in the base case, 
it is now 0.11 %. 

Initial buffer sensitivity If the initial buffer is increased, we observe a slight in- 
crease of the PVFP for all products. However, there are remarkable differences for 
the effect on TVOG between the traditional and the alternative products: While for 
the traditional product the TVOG remains approximately the same, for the alterna- 
tive products it is essentially reduced to zero. This strongly supports our product 
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Distribution of PVFP(n) with interest rat esensitivitv 



PVFP(n) in present value of premium income 

Fig. 5 Histogram of PVFP^ for interest rate sensitivity (—100 basis points) 


motivation in Sect. 2: For the alternative products, larger surpluses from previous 
years reduce risk in future years. 22 Furthermore, the stressed PVFPs imply that the 
decrease of capital requirement is significantly larger for the alternative products: 
0.14 % reduction (from 1.66 to 1.52 %) for alternative 1 and 0.17 % reduction (from 
1 .65 to 1 .48 %) for alternative 2, compared to just 0.01 % reduction for the traditional 
product. 


4 A Reduction in the Level of Guarantee 


So far we have only considered contracts with a different type of guarantee. We will 
now analyze contracts with a lower level of guarantee, i.e., products where i p < i r . 
If we apply a pricing rate of i p = 1.25 % instead of 1.75 %, the annual premium 
required to achieve the same guaranteed maturity benefit rises by approx. 5.4%, 
which results in an additional initial buffer for this contract design. For the sake of 
comparison, we also calculate the results for the traditional product with a lower 
guaranteed interest rate i = 1.25 %. The respective portfolios at t = 0 are derived 
using the assumptions described in Sect. 4.1. 

The results are presented in Table 9. We can see that the PVFP is further increased 
and the TVOG is very close to 0 for the modified alternative products, which implies 
an almost symmetric distribution of the PVFP. The TVOG can even become slightly 
negative due to the additional buffer in all scenarios. Although the risk situation for 
the traditional product is also improved significantly due to the lower guarantee, the 


22 From this, we can conclude that if such alternative products had been sold in the past, the risk 
situation of the life insurance industry would be significantly better today in spite of the rather high 
nominal maturity guarantees for products sold in the past. 
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Table 9 PVFP, TVOG, PVFP under interest rate stress and Z\PVFP for the alternative products 
with lower pricing rate 



Traditional 
product (%) 

Alternative 

1(%) 

Alternative 
2 (%) 

Traditional 
i = 1.25 (%) 

Alternative 1 
i p = 1.25 (%) 

Alternative 2 
i p = 1 .25 (%) 

PVFP 

3.63 

4.24 

4.25 

4.12 

4.31 

4.31 

TVOG 

0.63 

0.02 

0.01 

0.14 

-0.05 

-0.05 

PVFP 

(stress) 

0.90 

2.58 

2.60 

2.43 

3.28 

3.32 

Z\PVFP 

2.73 

1.66 

1.65 

1.69 

1.03 

0.99 


alternative products can still preserve their advantages. A more remarkable effect 
can be seen for the SCR( nt , which amounts to 1.03 and 0.99% for the alternative 
products 1 and 2, respectively, compared to 1 .69 % for the traditional product. Hence, 
the buffer leads to a significant additional reduction of solvency capital requirements 
for the alternative products meaning that these are less affected by interest rate risk. 


5 Conclusion and Outlook 


In this paper, we have analyzed different product designs for traditional participating 
life insurance contracts with a guaranteed maturity benefit. A particular focus of our 
analysis was on the impact of product design on capital requirements under risk-based 
solvency frameworks such as Solvency II and on the insurer’s profitability. 

We have performed a market consistent valuation of the different products and 
have analyzed the key drivers of Capital Efficiency, particularly the value of the 
embedded options and guarantees and the insurer’s profitability. 

As expected, our results confirm that products with a typical year-to-year guaran- 
tee are rather risky for the insurer, and hence result in a rather high capital requirement. 
Our proposed product modifications significantly enhance Capital Efficiency, reduce 
the insurer’s risk, and increase profitability. Although the design of the modified prod- 
ucts makes sure that the policyholder receives less than with the traditional product 
only in extreme scenarios, these products still provide a massive relief for the insurer 
since extreme scenarios drive the capital requirements under Solvency II and SST. 

It is particularly noteworthy that starting from a standard product where the guar- 
anteed maturity benefit is based on an interest rate of 1.75 %, changing the type of 
the guarantee to our modified products (but leaving the level of guarantee intact) has 
the same impact on profitability as reducing the level of guarantee to an interest rate 
of 0.9 % and not modifying the type of guarantee. Furthermore, it is remarkable that 
the reduction of SCR i nt from the traditional to the alternative contract design is very 
robust throughout our base case as well as all sensitivities and always amounts to 
slightly above one percentage point. 
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We would like to stress that the product design approach presented in this paper 
is not model arbitrage (hiding risks in “places the model cannot see”), but a real 
reduction of economic risks. In our opinion, such concepts can be highly relevant in 
practice if modified products keep the product features that are perceived and desired 
by the policyholder, preserve the benefits of intertemporal risk sharing, and do away 
with those options and guarantees of which policyholders often do not even know 
they exist. Similar modifications are also possible for many other old age provision 
products like dynamic hybrid products 23 or annuity payout products. Therefore, we 
expect that the importance of “risk management by product design” will increase. 
This is particularly the case since — whenever the same pool of assets is used to back 
new and old products — new capital efficient products might even help reduce the 
risk resulting from an “old” book of business by reducing the required yield of the 
pool of assets. 

We, therefore, feel that there is room for additional research: It would be interesting 
to analyze similar product modifications for the annuity payout phase. Also — since 
many insurers have sold the traditional product in the past — an analysis of a change 
in new business strategy might be worthwhile: How would an insurer’s risk and 
profitability change and how would the modified products interact with the existing 
business if the insurer has an existing (traditional) book of business in place and 
starts selling modified products today? 

Another interesting question is how the insurer’s optimal strategic asset allocation 
changes if modified products are being sold: If typical criteria for determining an 
optimal asset allocation are given (e.g., maximizing profitability under the restriction 
that some shortfall probability or expected shortfall is not exceeded), then the c.p. 
lower risk of the modified products might allow for a more risky asset allocation, and 
hence also higher expected profitability for the insurer and higher expected surplus 
for the policyholder. So, if this dimension is also considered, the policyholder would 
be compensated for the fact that he receives a weaker type of guarantee. 

Finally, our analysis so far has disregarded the demand side. If some insurers 
keep selling the traditional product type, there should be little demand for the alter- 
native product designs with reduced guarantees unless they provide some additional 
benefits. Therefore, the insurer might share the reduced cost of capital with the poli- 
cyholder, also resulting in higher expected benefits in the alternative product designs. 

Since traditional participating life insurance products play a major role in old-age 
provision in many countries and since these products have come under strong pressure 
in the current interest environment and under risk-based solvency frameworks, the 
concept of Capital Efficiency and the analysis of different product designs should be 
of high significance for insurers, researchers, and regulators to identify sustainable 
life insurance products. In particular, we would hope that legislators and regulators 
would embrace sustainable product designs where the insurer’s risk is significantly 
reduced, but key product features as perceived and requested by policyholders are 
still present. 


23 


Cf. Kochanski and Kamarski [22]. 


Participating Life Insurance Contracts under Risk Based Solvency Frameworks . . . 


207 


Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 


References 


1. Allianz. Presentation Allianz Capital Markets Day, 2013. Available at https://www.allianz. 
com/v_1372138505000/media/investor_relations/en/conferences/capital_markets_days/ 
documents/ 20 1 3_allianz_cmd.pdf 

2. BaFin. Ergebnisse der funften quantitativen Auswirkungsstudie zu Solvency II (QIS 5), 2011. 

Available at http://www.bafin.de/SharedDocs/Downloads/DE/Versicherer_Pensionsfonds/ 
QIS/dl_qis5_ergebnisse_bericht_bafin.pdf? blob=publicationFile&v=8 

3. Barbarin, J., Devolder, R: Risk measure and fair valuation of an investment guarantee in life 
insurance. Insur.: Math. Econ. 37(2), 297-323 (2005) 

4. Bauer, D., Kiesel, R., Kling, A., RuB, J.: Risk-neutral valuation of participating life insurance 
contracts. Insur.: Math. Econ. 39(2), 171-183 (2006) 

5. Bauer, D., ReuB, A., Singer, D.: On the calculation of solvency capital requirement based on 
nested simulations. ASTIN Bull. 42(2), 453-499 (2012) 

6. Bergmann, D.: Nested Simulations in Life Insurance. PhD thesis, University of Ulm (2011) 

7. Branger, N., Schlag, C.: Zinsderivate. Modelle und Bewertung, Berlin (2004) 

8. Briys, E., de Varenne, F.: On the risk of insurance liabilities: debunking some common pitfalls. 
J. Risk Insur. 64(4), 637-694 (1997) 

9. CFO-Forum. Market Consistent Embedded Value Principles, 2009. Available at http://www. 
cfoforum.nl/downloads/MCEV_Principles_and_Guidance_October_2009.pdf 

10. DAV. DAV Fachgrundsatz zum Market Consistent Embedded Value. Koln (2011) 

11. EIOPA. EIOPA Report on the fifth Quantitative Impact Study (QIS5) for Solvency II, 
2011. Available at http://eiopa.europa.eu/fileadmin/tx_dam/files/publications/reports/QIS5_ 
Report_Final.pdf 

12. EIOPA. Technical Specifications on the Long Term Guarantee Assessment, 2013. Avail- 
able at https:// eiopa.europa.eu/ consultations/ qis/insurance/long- term- guarantees- assessment/ 
technical- specifications/ index.html 

13. Gatzert, N.: Asset management and surplus distribution strategies in life insurance: an exami- 
nation with respect to risk pricing and risk measurement. Insur.: Math. Econ. 42(2), 839-849 
(2008) 

14. Gatzert, N., Kling, A. : Analysis of participating life insurance contracts: a unification approach. 
J. Risk Insur. 74(3), 547-570 (2007) 

15. Glasserman, P: Monte Carlo Methods in Financial Engineering. Springer, New York (1994) 

16. Graf, S., Kling, A., RuB, J.: Risk analysis and valuation of life insurance contracts: combining 
actuarial and financial approaches. Insur.: Math. Econ. 49(1), 115-125 (2011) 

17. Grosen, A., Jorgensen, R: Fair valuation of life insurance liabilities: the impact of interest rate 
guarantees, surrender options, and bonus policies. Insur.: Math. Econ. 26(1), 37-57 (2000) 

18. Grosen, A., Jorgensen, P: Life insurance liabilities at market value: an analysis of insolvency 
risk, bonus policy, and regulatory intervention rules in a barrier option framework. J. Risk Insur. 
69(1), 63-91 (2002) 

19. Grosen, A., Jensen, B., Jorgensen, P.: A finite difference approach to the valuation of path 
dependent life insurance liabilities. Geneva Pap. Risk Insur. Theory 26, 57-84 (2001) 

20. Kling, A., Richter, A., RuB, J.: The impact of surplus distribution on the risk exposure of with 
profit life insurance policies including interest rate guarantees. J. Risk Insur. 74(3), 571-589 
(2007) 

21. Kling, A., Richter, A., RuB, J.: The interaction of guarantees, surplus distribution, and asset 
allocation in with-profit life insurance policies. Insur.: Math. Econ. 40(1), 164-178 (2007) 


208 


A. ReuB et al. 


22. Kochanski, M., Kamarski, B.: Solvency capital requirement for hybrid products. Eur. Actuar. 
J. 1(2), 173-198 (2011) 

23. Mitersen, K., Persson, S.-A.: Guaranteed investment contracts: distributed and undistributed 
excess return. Scand. Actuar. J. 103(4), 257-279 (2003) 

24. Oechslin, J., Aubry, O., Aellig, M.: Replicating embedded options. Life Pensions pp. 47-52 
(2007) 

25. Saxer, W.: Versicherungsmathematik. Springer, Berlin (1955) 

26. Seyboth, M.: Der Market Consistent Appraisal Value und seine Anwendung im Rahmen der 
wertorientierten Steuerung von Lebensversicherungsuntemehmen. PhD thesis, University of 
Ulm (2011) 

27. Vasicek, O.: An equilibrium characterization of the term structure. J. Financ. Econ. 5(2), 177- 
188 (1977) 

28. Wolthuis, H.: Life Insurance Mathematics. CAIRE, Brussels (1994) 

29. Zaglauer, K., Bauer, D.: Risk-neutral valuation of participating life insurance contracts in a 
stochastic interest rate environment. Insur.: Math. Econ. 43(1), 29-40 (2008) 


Reducing Surrender Incentives Through Fee 
Structure in Variable Annuities 

Carole Bernard and Anne MacKay 


Abstract In this chapter, we study the effect of the fee structure of a variable annuity 
on the embedded surrender option. We compare the standard fee structure offered 
in the industry (fees set as a fixed percentage of the variable annuity account) with 
periodic fees set as a fixed, deterministic amount. Surrender charges are also taken 
into account. Under fairly general conditions on the premium payments, surrender 
charges and fee schedules, we identify the situation when it is never optimal for the 
policyholder to surrender. Solving partial differential equations using finite difference 
methods, we present numerical examples that highlight the effect of a combination 
of surrender charges and deterministic fees in reducing the value of the surrender 
option and raising the optimal surrender boundary. 


1 Introduction 


A variable annuity ( VA) is a unit-linked insurance product, which guarantees a certain 
amount at some future dates. Usually, the policyholder pays an initial premium for 
the contract. This premium is invested in a mutual fund chosen by the policyholder. 
There are different kinds of VAs defined by the type of guarantees embedded in the 
contract (for more details see Hardy [9]). In this paper, we focus on a variable annuity 
contract that pays the maximum of the mutual fund value and a guaranteed amount 
at maturity. This type of VA is referred to as a guaranteed minimum accumulation 
benefit (GMAB) (see Bauer et al. [1]). 

Typically, the fee that covers the management of the VA and embedded financial 
guarantees is set as a constant percentage of the VA account and withdrawn directly 
from it at regular intervals. When the account value is high, the financial guarantee 
is worth very little, but the fee is still being paid as the same percentage. Thus, it 
represents an incentive for the policyholder to surrender the contract and take the 
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amount accumulated in the account. Such surrenders represent an important risk for 
VA issuers as the expenses linked to the sale of the policy are typically reimbursed 
through the fees collected throughout the duration of the contract. As exposed by 
Kling et al. [11], unexpected surrenders also compromise the efficiency of dynamic 
hedging strategies. 

There are various ways to reduce the incentive to surrender a VA contract with 
guarantees. For example, insurance companies usually impose surrender charges, 
which reduce the amount available at surrender. Milevsky and Salisbury [13] argue 
that these charges are necessary for VA contracts to be both hedgeable and marketable. 
The design of VA benefits can also discourage policyholders from surrendering. Kling 
et al. [11] discuss for example the impact of ratchet options (possibility to reset the 
maturity guarantee as the fund value increases) to convince policyholders to keep 
the VA alive. Yet another way to reduce the incentive to surrender can be to modify 
the way fees are paid from the VA account. As explained above, the typical constant 
percentage fee structure leads to a mismatch between the fee paid and the value of 
the financial guarantee, which can discourage the policyholder from staying in the 
contract. 1 By reducing the fee paid when the value of the financial guarantee is low, it 
is possible to reduce the value of the real option to surrender embedded in a VA. The 
new fee structure can take different forms. For example, Bernard et al. [2] suggest to 
set a certain account value above which no fee will be paid. This is shown to modify 
the rational policyholder’s surrender incentive. In this paper, we explore another fee 
structure so that part of (or all) the fee is paid as a deterministic periodic amount. The 
intuition behind this fee structure is that the amount will represent a lower percentage 
of the account value as the value of the financial guarantee decreases. This will affect 
the surrender incentive, and reduce the additional value created by the possibility to 
surrender the contract. 

To explore the effect of the deterministic fee amount on the surrender incentive, we 
consider a VA with a simple GMAB. We assume that the total fee withdrawn from the 
VA account throughout the term of the contract is set as the sum of a fixed percentage 
c of the account value, and a deterministic, pre-determined amount p t at time t (in 
other words, the deterministic amount does not need to be constant). 2 Our paper 
constitutes a significant extension of the results obtained on the optimal surrender 
strategy for a fee set as a fixed percentage of the fund [4], since the deterministic fee 
structure increases the complexity of the dynamics of the VA account value. For this 
reason, we need to resort to PDE methods to obtain the optimal surrender strategy 


1 Specifically, the policyholder has the option to surrender the contract and to receive a “surrender 
benefit”, which can be more valuable than the contract itself. This additional value, as well as the 
optimal surrender strategy, is explored and quantified by Bernard, MacKay, and Muehlbeyer in [4] 
in the case when the fees are paid as a percentage of the underlying fund. 

2 Note that the deterministic amount component of the fee can be interpreted as a variable percentage 
of the account value F t . In fact, let p denote the percentage of the fund value that yields the same 
fee amount as the deterministic amount p t . Then, p is a function of time and of the fund value F t , 
and can be computed as pit, F t ) = Pt/F t . Then, p(t, F t )F t = p t is the fee paid at time t. 
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when a portion of the fee is set as a deterministic amount. This paper also extends the 
work done on state-dependent fee structures, since Bernard et al. [2] do not quantify 
the reduction in the surrender incentive resulting from the new fee structure. 

Throughout the paper, our main goal is to investigate the impact of the deter- 
ministic fee amount on the value of the surrender option. In Sect. 2, we describe the 
model and the VA contract. Section 3 introduces a theoretical result and discusses 
the valuation of the surrender option. Numerical examples are presented in Sects. 4 
and 5 concludes. 


2 Assumptions and Model 

Consider a market with a bank account yielding a constant risk-free rate r and an 
index evolving as in the Black-Scholes model so that 

d S t 

— = rdt + crdW t , 

S t 


under the risk-neutral measure Q, where o > 0 is the constant instantaneous volatility 
of the index. Let & t be the natural filtration associated with the Brownian motion W t . 

In this paper, we use a Black-Scholes setting since its simplicity allows us to 
compute prices explicitly, and thus to study the surrender incentive precisely. More 
realistic market models could be considered, but resorting to Monte Carlo methods or 
more advanced numerical methods would be required. Since the focus of this paper is 
on the surrender incentive, we believe that the Black-Scholes model’s approximation 
of market dynamics is sufficient to provide insight on the effect of the deterministic 
amount fee structure. 


2.1 Variable Annuity 

We consider a VA contract with an underlying fund fully invested in the index S. At 
time t, we assume that the fee paid is the sum of a constant percentage c > 0 of the 
account value and a deterministic amount p t . Setting p t = 0, we will find back the 
results commonly used in the literature with the fee being only paid as a percentage 
of the fund (see for example [4]). 

The motivation to study periodic deterministic fees is that the surrender incentives 
when the fees are paid as a fixed percentage of the fund are larger than when the fees 
are set as a deterministic amount. This will be illustrated via numerical examples in 
Sect. 4. 

We further assume that the investment of the policyholder is Pq at time 0, and 
that regular additional premiums a t are paid at time t. Additional contributions are 
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common in variable annuities but they are regularly neglected in the literature and 
most academic research focuses on the single premium case as it is simpler. When 
additional contributions can be made to the account throughout time, VAs are called 
Flexible Premiums Variable Annuities (FPVAs). Chi and Lin [7] provide examples 
of such VAs where the policyholder is given the choice between a single premium 
and a periodic monthly payment in addition to some initial lump sum. Analytical 
formulae for the value of such contracts can be found in [8, 10]. In the first part of 
this chapter, we show how flexible premium payments influence the surrender value. 

We assume that all premiums paid at 0 and at later times t are invested in the fund. 
All fees (percentage or fixed fees) are taken from the fund. We need to model the 
dynamics of the fund. Our approach is inspired by Chi and Lin [7]. For the sake of 
simplicity, we assume that all cash flows happen in continuous time, so that a fixed 
payment of A at time 1 (say, end of the year) is similar to a payment made continuously 
over the interval [0, 1]. Due to the presence of a risk-free rate r, an amount paid at 
time T equal to A is equivalent to an instantaneous contribution of a t d t at any time 
t e (0, 1] so that the annual amount paid per year is A = a t e r ^ l ~^dt. By abuse 
of notation, if a t is constant over the year, we will write that a t is the annual rate of 
contribution per year (although there is no compounding effect). 

Specifically, the dynamics of the fund can be written as follows 


with Fo = Po, and where F t denotes the value of the fund at time t, a t is the annual 
rate of contributions, c is the annual rate of fees, and p t is the annual amount of fee 
to pay for the options. Similarly as [7] it is straightforward to show that 


d F t = (r — c)F t dt + crF t dW t + a t dt — p t dt 


t 


F, = Foe^-^^ + J (a s — t > 0, 


0 


that is 



( 1 ) 


o 


in particular Pq = Fq = Sq. To simplify the notation, we will write 



( 2 ) 


o 


where b s — a s — p s can take values in M. While in the case of regular contributions, 
b s is typically positive, it can also be negative, for example in the single premium 
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case, or if the regular premiums are very low. We will split b s into contributions a s 
and deterministic fees p s when it is needed for the interpretation of the results. 

This formulation can be seen as an extension of the case studied in [7], where it 
is assumed that a constant contribution parameter a t = a for all t and there is no 
periodic fees, so that p t = 0. It is clear from (2) that the fund value becomes path- 
dependent and involves a continuous arithmetic average. Without loss of generality, 
let Fq = Sq. 


2.2 Benefits 

We assume that there is a guaranteed minimum accumulation rate g < r on all the 
contributions of the policyholder until time t so that the accumulated guaranteed 
benefit G t at time t has dynamics 


where Go = Po at time 0. Thus, at time t the guaranteed amount G t can be expressed 
as 


When the annual rate of contribution is constant (< a t = a), the guaranteed value can 
be simplified to 


Chi and Lin [7] develop techniques to price and hedge the guarantee at time t. Using 
their numerical approach it is possible to estimate the fair fee for the European VA 
(Proposition 3 in their paper). 

As in [4, 13], we assume that the policyholder has the option to surrender the 
policy at any time t and to receive a surrender benefit at surrender time equal to 


where K t is a penalty percentage charged for surrendering at time t. As presented 
for instance in [3, 13] or [15], a standard surrender penalty is decreasing over time. 
Typical VAs sold in the US have a surrender charge period. In general, the maxi- 
mum surrender charge is around 8 % of the account value and decreases during the 
surrender charge period. A typical example is New York Life’s Premier Variable 
Annuity [14], for which the surrender charge starts at 8 % in the first contract year, 
decreases by 1 % per year to reach 2 % in year 7. From year 8 on, there is no penalty 


d G t = gG t dt + a t dt 



o 



(1 -K t )F t 
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on surrender. In another example, “the surrender charge is 7 % during the first Con- 
tract Year and decreases by 1 % each subsequent Contract Year. No surrender charge 
is deducted for surrenders occurring in Contract Years 8 and later” [17]. 


3 Valuation of the Surrender Option 

In this section, we discuss the valuation of the variable annuity contract with maturity 
benefit and surrender option. 3 We first present a sufficient condition to eliminate the 
possibility of optimal surrender. We then explain how we evaluate the value of the 
surrender option using partial differential equations (PDEs). We consider a variable 
annuity contract with maturity benefit only, which can be surrendered. We choose to 
ignore the death benefits that are typically added to that type of contract since our 
goal is to analyze the effect of the fee structure on the value of the surrender option. 


3.1 Notation and Optimal Surrender Decision 

We denote by v(t, F t ) and V(t, F t ) the value of the contract without and with sur- 
render option, respectively. In this paper, we ignore death benefits and assume that 
the policyholder survives to maturity. 4 Thus, the value of the contract without the 
surrender option is simply the risk-neutral expectation of the payoff at maturity, 
conditional on the filtration up to time t. 

v(t, Ft) = E[e - r(r - r) max(G r , F T ) \& t ] (3) 

We assume that the difference between the value of the maturity benefit and the 
full contract is only attributable to the surrender option, which we denote by e(t, F t ). 
Then, we have the following decomposition. 


V(t,F t ) = v(f,F t ) + e(t,F t ) (4) 

The value of the contract with surrender option is calculated assuming that the 
policyholder surrenders optimally. This means that the contract is surrendered as 
soon as its value drops below the value of the surrender benefit. To express the total 
value of the variable annuity contract, we must introduce further notation. We denote 
by F? t the set of all stopping times t greater than t and bounded by T . Then, we can 
express the continuation value of the VA contract as 


3 In this paper, we quantify the value added by the possibility for the policyholder to surrender 
his policy. We call it the surrender option, as in [13]. It is not a guarantee that can be added to the 
variable annuity, but rather a real option created by the fact that the contract can be surrendered. 

4 See [2] for instance for a treatment on how to incorporate mortality benefits. 
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V*(t, F t ) = sup £ , [e“ r(T “ f V(r, rV)L 

where 

max(Gr,x), if t = T 

is the payoff of the contract at surrender or maturity. Finally, we let S?t be the optimal 
surrender region at time t e [0, T]. The optimal surrender region is given by the 
fund values for which the surrender benefit is worth more than the VA contract if the 
policyholder continues to hold it for at least a small amount of time. Mathematically 
speaking, it is defined by 


— {F t : V*(t, F t ) < yjr(t, F t )}. 


The complement of the optimal surrender region S?t will be referred to as the con- 
tinuation region. We also define B t , the optimal surrender boundary at time t, by 


B t = 


Ft 


inf 

€ [0,OO) 


{Ft € &}. 


3.2 Theoretical Result on Optimal Surrender Behavior 

According to (2) the account value F t can be written as follows at time t 

t 

F, = e~ ct S, + f b s e~ c(t - s) —ds, t ^ 0, 

J $s 

0 

and at time t + dt, it is equal to 

t-fdt 

F t+dt = e- c ^S t+dt + j b,e- c(l+d ‘-^^ds. 

0 

Proposition 3.1 (Sufficient condition for no surrender) For a fixed time t e [0, T], 
a sufficient condition to eliminate the surrender incentive at time ( t ’ is given by 


(k[ + (1 - K t )c)F, < b t ( 1 - K t ) , 


(5) 
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where k[ = dic t /dt. Here, are some special cases of interest: 

• Whena t = p t = 0 (no periodic investment, no periodic fee) and K t = 1— e~ K ^ T ~^ 
( situation considered by [4]) then b t = 0 and (5) becomes 

k > c. 

• When a t = 0 (no periodic investment, i.e., a single lump sum paid at time 0), then 
b t = —p t < 0. Assume that p t > 0 so that b t < 0 thus 

- IfK r t + (1 — K t )c > 0 (for example if k is constant ), then the condition can never 
be satisfied and no conclusion can be drawn. 

- If K f t + (1 — K t )c < 0 then it is not optimal to surrender when 


-p t (l-K t ) 

Ft > — ; • 

k' + (1 - K t )c 

When K t = k and b t = b are constant over time, condition (5) can be rewritten as 

b(l - k) b 

F t < = 

c( 1 — k) c 

Remark 3.1 Proposition 3.1 shows that in the absence of periodic fees and invest- 
ment, an insurer can easily ensure that it is never optimal to surrender by choosing a 
surrender charge equal to 1 — e~ Kt at time t, with a penalty parameter k higher than 
the percentage fee c. Proposition 3.1 shows that it is also possible to eliminate the 
surrender incentive when there are periodic fees and investment opportunities, but 
the conditions are more complicated. 

Proof Consider a time t at which it is optimal to surrender. This implies that for any 
time interval of length d t > 0, it is better to surrender at time t than to wait until 
time t + dt. In other words, the surrender benefit at time t must be at least equal to 
the expected discounted value of the contract at time t + dt, and in particular larger 
than the surrender benefit at time t + dt. Thus 

(1 - K t )F, > E[e~ rdt ( 1 - K t+dt )F t+dt 1.^,1 


Using the martingale property for the discounted stock price S t and the independence 
of increments for the Brownian motion, we know that E[S t +dte~ rdt ] = St and 



= e rdt thus 
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i 

E[e- rd, F t+dl \& t ]=e- c(t+dt) S t + j b s e~ c(l+dl - s) yds 

0 

/" ~ d/" 

J b se - c(t+dt ~ s) e - rdt E ^ St+it 


+ / b s e 


ds, 


= . e~ c(t+dt \ S t + jb s e~ c{t+dt ~ s) jds + j b s e~ c{t+d, - s) ds 

0 5 t 

= e~ cdt F t + e~ cdt J b s e~ c{t ~ s) ds. (( 


Thus 




(1 - k,)F, > (1 - K t+it ) \e~ cdt F t + e~ cdt j b s e~ c(t ~ s) ds 

We then use K t +dt = K t +K f t dt +o(dt), e~ cdt = 1 — cdt +o(dt) and // +d? b s e~ c ^~ s ^ 
d^ = b t dt + o(dt) to obtain 

(1 - K t )F t > (1 -K t - K t dt) ((1 - cdt) F t + (1 - cdt)b t dt) + j(dt), 

which can be further simplified into 

(*; + (1 - Kt)c)F t dt > b t ( 1 - Kt)dt + j(dt). (7) 

where the function j (d t) is o(dt). Since this holds for any dt > 0, we can divide (7) 
by dt and take the limit as dr — >► 0. Then, we get that if it is optimal to surrender the 
contract at time r, then 


(jc't + (1 - K t )c)F t > b t ( 1 - K t ). 

It follows that if (jc' t + (1 — K t )c)F t < b t ( 1 — K t ), it is not optimal to surrender the 
contract at t. □ 


3.3 Valuation of the Surrender Option Using PDEs 

To evaluate the surrender option e(t, F t ), we subtract the value of the maturity benefit 
from the value of the VA contract. These values can be compared to American and 
European options, respectively, since the guarantee in the former is only triggered 
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when the contract expires, while the latter can be exercised at any time before matu- 
rity. 

From now on, we assume that the deterministic fee p t is constant over time, so that 
p t = p for any time t. We also assume that the policyholder makes no contribution 
after the initial premium (so that a t — 0 for any t). 

It is well-known 5 that the value of a European contingent claim on the fund value 
F{ follows the following PDE: 


dv 1 d 2 V 99 

— + oFfcr 2 + (F t (r 

dt 2dF? 1 az7 


dv 

dF t 


c) — p) — rv = 0. 


( 8 ) 


Note that Eq. (8) is very similar to the Black-Scholes equation for a contingent 
claim on a stock that pays dividends (here, the constant fee c represents the dividends), 
with the addition of the term |^- p resulting from the presence of a deterministic fee. 
Since it represents the contract described in Sect. 2, Eq. (8) is subject to the following 
conditions: 


v(T , Ft) = max(Gr, Ft) 

lim v(t, F t ) = G T e~ r(T ~ t} . 
F t ^ 0 


The last condition results from the fact that when the fund value is very low, the 
guarantee is certain to be triggered. When F t -> oo, the problem is unbounded. 
However, we have the following asymptotic behavior: 

lim v{t,F,) = E t [F T e- r{T - , \ (9) 

F t ->o o 

which stems from the value of the guarantee approaching 0 for very high fund values. 
We will use this asymptotic result to solve the PDE numerically, when truncating the 
grid of values for F t . The expectation in (9) is easily calculated and is given in the 
proof of Proposition 3.1. 

As it is the case for the American put option, 6 the VA contract with surrender option 
gives rise to a free boundary problem. In the continuation region, V*(t, F t ) follows 
Eq. (8), the same equation as for the contract without surrender option. However, in 
the optimal surrender region, the value of the contract with surrender is the value of 
the surrender benefit: 

V*(t, F t ) = fit, F t ), t e [0, T], F t e y t - (10) 

For the contract with surrender, the PDE to solve is thus subject to the following 
conditions: 


5 See, for example [5, Sect. 7.3]. 

6 See, for example [6] . 
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V*(T, F t ) = ma x(G T ,F T ) 

lim V*(t, F t ) = G T e~ r(T ~ t) 
F t -+ 0 

lim V*(t,F t ) = 1r(t,B t ). 

F t ^B t 

lim JL V *(t,F t ) = l-K t . 

F t ^B t dF t 


For any time t e [0, T], the value of the VA with surrender is given by 
V(t, F t ) = max(V*(A F t ), F t )). 

This free boundary problem is solved in Sect. 4 using numerical methods. 


4 Numerical Example 


To price the VA using a PDE approach, we modify Eq. (8) to express it in terms 
of x t = log F t . We discretize the resulting equation over a rectangular grid with 
time steps d t = 0.0001 (d t = 0.0002 for T = 15) and dx = crV3dt (following 
suggestions by Racicot and Theoret [16]), from 0 to T in t and from 0 to log 450 in 
x. We use an explicit scheme with central difference in x and in x 2 . 

Throughout this section, we assume that the contract is priced so that only the 
maturity benefit is covered. In other words, we set c and p such that 


Pq = v(t, F t ), (11) 

where Po denotes the initial premium paid by the policyholder. In this section, when 
the fee is set in the manner, we call it th e fair fee, even if it does not cover the full 
value of the contract. We set the fee in this manner to calculate the value added by 
the possibility to surrender. 


4.1 Numerical Results 

We now consider variable annuities with the maturity benefit described in Sect. 2. 
We assume that the initial premium Po = 100, that there are no periodic premium 
(a s = a = 0), that the deterministic fee is constant (p t = p) and that the guaranteed 
roll-up rate is g = 0. We further assume that the surrender charge, if any, is of the 
form K t — 1 — e K ^ r ~ t \ and that r = 0.03 and o — 0.2. 

For contracts with and without surrender charge and with maturity 5, 10 and 
15 years, the results are presented in Table 1. In each case, the fee levels c and p are 
chosen such that Pq = v(t, F t ). As a percentage of the initial premium, the fair fee 
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Table 1 Value of the surrender option in 5-, 10- and 15 -year variable annuity contracts for various 
fee structures and surrender charges 


T = 5 

T = 10 

T = 15 


Surrender option 


Surrender option 


Surrender option 

Fee 

K 

Fee 

K 

Fee 

K 

c(%) 

P 

0% 

0.5% 

c(%) 

P 

0% 

0.5% 

c(%) 

P 

0% 

0.4% 

0.00 

4.150 

3.09 

2.09 

0.00 

2.032 

3.07 

1.02 

0.00 

1.259 

2.76 

0.23 

1.00 

2.971 

3.32 

2.33 

0.50 

1.387 

3.50 

1.46 

0.30 

0.842 

3.30 

0.77 

2.00 

1.796 

3.56 

2.57 

1.00 

0.744 

3.92 

1.89 

0.60 

0.427 

3.84 

0.84 

3.53 

0.000 

3.92 

2.94 

1.58 

0.000 

4.43 

2.39 

0.91 

0.000 

4.40 

1.86 


For the 15 -year contract, we lowered the surrender charge parameter to k = 0.4 % to ensure that 
the optimal surrender boundary is always finite 


when it is paid as a deterministic amount is higher than the fair constant percentage 
fee. In fact, for high fund values, the deterministic fee is lower than the amount paid 
when the fee is set as a constant percentage. But when the fund value is low, the 
deterministic fee represents a larger proportion of the fund compared to the constant 
percentage fee. This higher proportion drags the fund value down and increases the 
option value. The effect of each fee structure on the amount collected by the insurer 
can explain the difference between the fair fixed percentage and deterministic fees. 

The results in Table 1 show that when the fee is set as a fixed amount, the value of 
the surrender option is always lower than when the fee is expressed as a percentage 
of the fund. When a mix of both types of fees is applied, the value of the surrender 
option decreases as the fee set as a percentage of the fund decreases. When the fee 
is deterministic, a lower percentage of the fund is paid out when the fund value is 
high. Consequently, the fee paid by the policyholder is lower when the guarantee 
is worth less, reducing the surrender incentive. This explains why the value of the 
surrender option is lower for deterministic fees. This result can be observed both with 
and without surrender charges. However, surrender charges decrease the value of the 
surrender option, as expected. The effect of using a deterministic amount fee, instead 
of a fixed percentage, is even more noticeable when a surrender charge is added. A 
lower surrender option value means that the possibility to surrender adds less value 
to the contract. In other words, if the contract is priced assuming that policyholders 
do not surrender, unexpected surrenders will result in a smaller loss, on average. 

Figure 1 shows the optimal surrender boundaries for the fee structures presented 
in Table 1 for 10-year contracts. As expected, the optimal boundaries are higher 
when there is a surrender charge. Those charges are put in place in part to discourage 
policyholders from surrendering early. The boundaries are also less sensitive to the 
fee structure when there is a surrender charge. In fact, when there is a surrender 
charge, setting the fee as a fixed amount leads to a higher optimal boundary during 
most of the contract. This highlights the advantage of the fixed amount fee structure 
combined with surrender charges. Without those charges, the fixed fee amount could 
lead to more surrenders. We also note that the limiting case p — 0 corresponds to 
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Time (in years) 



Time (in years) 


k = 0 


k = 0.005 


Fig. 1 Optimal surrender boundary when T = 10 


the situation when fees are paid as a percentage of the fund. The optimal boundary 
obtained using the PDE approach in this paper coincides with the optimal boundary 
derived in [4] by solving an integral equation numerically. 

Table 1 also shows the effect of the maturity combined with the fee structure on 
the surrender option. For all maturities, setting the fee as a fixed amount instead of 
a fixed percentage has a significant effect on the value of the surrender option. This 
effect is amplified for longer maturities. As for the 10-year contract, combining the 
fixed amount fee with a surrender charge further reduces the value of the surrender 
option, especially when T = 15. The optimal surrender boundaries for different fee 




k = 0 k = 0.004 


Fig. 2 Optimal surrender boundary when T = 15 
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structures when T = 15 are presented in Fig. 2. For longer maturities such as this 
one, the combination of surrender charges and deterministic fee raises the surrender 
boundary more significantly. 

In all cases, the decrease in the value of the surrender option caused by the com- 
bination of a deterministic amount fee and a surrender charge is significant. In our 
example with a 15 -year contract, moving from a fee entirely set as a fixed percentage 
to a fee set as a deterministic amount reduces the value of the surrender option by 
over 85 %. This is surprising since the shift in the optimal surrender boundary is 
not as significant (as can be observed in Figs. 1 and 2). A possible explanation for 
the sharp decrease in the surrender option value is that the fee income lost when a 
policyholder surrenders when the account value is high is less important, relatively 
to the value of the guarantee, than in the constant percentage fee case. 


5 Concluding Remarks 


In this chapter, the maturity guarantee fees are paid during the term of the contract as 
a series of deterministic amounts instead of a percentage of the fund, which is more 
common in the industry. We give a sufficient condition that allows the elimination 
of optimal surrender incentives for variable annuity contracts with fairly general fee 
structures. We also show how deterministic fees and surrender charges affect the 
value of the surrender option and the optimal surrender boundary. In particular, we 
highlight the efficiency of combining deterministic fees and exponential surrender 
charges in decreasing the value of the surrender option. In fact, although the optimal 
surrender boundary remains at a similar level, a fee set as a deterministic amount 
reduces the value of the surrender option, which makes the contract less risky for the 
insurer. This result also suggests that the state-dependent fee suggested in [2] could 
also be efficient in reducing the optimal surrender incentive. Future work could focus 
on more general payouts (see for example [12] for ratchet and lookback options [4] 
for Asian benefits) in more general market models, and include death benefits. 
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A Variational Approach 

for Mean- Variance-Optimal Deterministic 

Consumption and Investment 


Marcus C. Christiansen 


Abstract A significant number of life insurance contracts are based on 
deterministic investment strategies — this justifies to restrict the set of admissible 
controls to deterministic controls. Optimal deterministic controls can be identified 
by Hamilton- Jacobi-Bellman techniques, but for the corresponding partial differen- 
tial equations only numerical solutions are available and so the general existence 
of optimal controls is unclear. We present a non-constructive existence result and 
derive necessary characterizations for optimal controls by using a Pontryagin maxi- 
mum principle. Furthermore, based on the variational idea of the Pontryagin maxi- 
mum principle, we derive a numerical optimization algorithm for the calculation of 
optimal controls. 


1 Introduction 

Among many other applications, individual investment strategies arise in pension 
saving contracts, see for example Cairns [6]. While in dynamic optimal consumption- 
investment problems one typically aims to find an optimal control from the set of 
adapted processes, in insurance practice quite a number of contracts rely on deter- 
ministic investment strategies. Deterministic investment and consumption strategies 
have the advantage that they are easier to organize in asset management, that they 
make future consumption predictable, and that they are easier to communicate. From 
a mathematical point of view, deterministic control avoids unwanted features of sto- 
chastic control such as diffusive consumption, satisfaction points and consistency 
problems. For further arguments and a detailed comparison of stochastic versus 
deterministic control see also Menkens [17]. 

The present paper is motivated by Christiansen and Steffensen [9], where mean- 
variance- optimal deterministic consumption and investment is discussed in a Black- 
Scholes market. Sufficient conditions for optimal strategies are derived from a 
Hamilton- Jacobi-Bellman approach, but only numerical solutions and no analytical 
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solutions are given. That means that the general existence of solutions remains 
unclear. We fill that gap, allowing for a slightly more general model with non-constant 
Black-Scholes market parameters. By applying a Pontryagin maximum principle, we 
additionally verify that the sufficient conditions of Christiansen and Steffensen [9] 
for optimal controls are actually necessary. Furthermore, we present an alternative 
numerical algorithm for the calculation of optimal controls. Therefore, we make use 
of the variational idea behind the Pontryagin maximum principle. In a first step, 
we define generalized gradients for our objective function, which, in a second step, 
allows us to construct a gradient ascent method. 

Mean-variance investment is a true classic since the seminal work by Markowitz 
[16]. Since then various authors have improved and extended the results, see for exam- 
ple Korn and Trautmann [12], Korn [13], Zhou and Li [18], Basak and Chabakauri 
[3], Kryger and Steffensen [15], Kronborg and Steffensen [14], Alp and Korn [1], 
Bjork, Murgoci and Zhou [5] and others. 

Deterministic optimal control is fundamental in Herzog et al. [11] and Geering 
et al. [10]. But apart from other differences, they disregard income and consumption 
and focus on the pure portfolio problem without cash flows. Bauerle and Rieder 
[2] study optimal investment for both, adapted stochastic strategies and determin- 
istic strategies. They discuss various objectives including mean- variance objectives 
under constraints. In the present paper, we discuss an unconstrained mean- variance- 
objective and we also control for consumption. 

The paper is structured as follows. In Sect. 2, we set up a basic model framework 
and specify the optimal consumption and investment problem that we discuss here. 
In Sect. 3, we present an existence result for the optimal control. Section 4 derives 
necessary conditions for optimal controls by applying a Pontryagin maximum prin- 
ciple. Section 5 defines and calculates generalized gradients for the objective, which 
helps to set up a numerical optimization algorithm in Sect. 6. In Sect. 7 we illustrate 
the numerical algorithm. 


2 The Mean- Variance-Optimal Deterministic Consumption 
and Investment Problem 

Let #([0, T ]) denote the space of bounded Borel-measurable functions, equipped 
with the uniform norm || • H^. On some finite time interval [0, T], we assume that 
we have a continuous income with nonnegative rate a e #([0, T ]) and a continuous 
consumption with nonnegative rate c e #([0, T]). Let C([0, T]) bet the set of 
continuous functions on [0, T]. The positive initial wealth xo and the stochastic 
wealth X (t) at t > 0 is distributed between a bank account with risk-free interest 
rate r e C([0, T ]) and a stock or stock fund with price process 


d S(t) = S(t)a(t)dt + S(t)a(t)dW(t), 5(0) = 1, 
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where a(t) > r(t) > 0, a{t) > 0 and a, o e C([0, T]). We write n(t) for the 
proportion of the total value invested in stocks and call it the investment strategy. 
The wealth process X ( t ) is assumed to be self-financing. Thus, it satisfies 


d X(t) = X(t)(r(t) + ( a(t ) — r(t))n(t))dt + ( a(t ) — c(t))dt + X(t)a(t)n(t)dW(t) 

(i) 


with initial value X (0) = xq and has the explicit representation 


X(t) = x 0 efo du + J ( a (s ) - c(s))e£ du ds, (2) 

0 


where 


d U(t) = ( r(t ) + ( a(t ) — r(t))n(t) — -cr(t) 2 7r(t) 2 )dt 

+ cr(t)n(t)dW(t) 9 U( 0) = 0. (3) 

It is important to note that the process ( X (t )) t > o depends on the choice of the invest- 
ment strategy (Tt(t)) t >o and the consumption rate o- In order to make that 

dependence more visible, we will also write X = X^ ,c \ For some arbitrary but 
fixed risk aversion parameter y > 0 of the investor, we define the risk measure 

MV k H :=E[-]-yVar[-l 


We aim to maximize the functional 


G(tt, c) := MV y 


r t 


/ 


+ e~ pT X (7t ’ c) (T ) 


( 4 ) 


with respect to the investment strategy n and the consumption rate c. The parameter 
p > 0 describes the preference for consuming today instead of tomorrow. 


3 Existence of Optimal Deterministic Control Functions 

In Christiansen and Steffensen [9], where a Hamilton-Jacobi-Bellman approach is 
used, the existence of optimal control functions is related to the existence of solu- 
tions for the Hamilton-Jacobi-Bellman partial differential equation. However, only 
numerical solutions are available, so the general existence of solutions is unclear. 
Here, we fill that gap by giving an existence result for optimal deterministic control 
functions. The proof needs rather weak assumptions, but it is not constructive. 
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Theorem 1 Let G : D -> (— oc, oo) defined by (4) for 

D := {(nr, c) € 5([0, T]) x fl([0, T ]) : c(t) < c(t) < c(f), t € [0, T]} 

wzY/z lower and upper consumption bounds c,ce #([0, T ]). Then, the functional G 
is continuous and has a finite upper bound. 

Proof We first show that MV y [X (7 1 )] = MV y [X^ n ^ ( T )] has a finite upper bound 
that does not depend on (jv, c). Defining the stochastic process 

Y(t) :=X(t)-yX{t) 2 + yE[X(t)f, fe[0J], (5) 

we have MV y [X (T)] = E[Y (T')\. So it suffices to show that E [Y (T)\ has a finite 
upper bound that does not depend on (n, c ). Since the quadratic variation process of 
X satisfies d[Z](t) = X (t) 2 o- (t) 2 7r(t) 2 dt , from Ito’s Lemma we get that 

d£[X(r)] = E[X(t)] { r(t) + ( a(t ) - r(t))n(t)}dt + ( a(t ) - c(t))dt, (6) 

d E[X(t) 2 ] = 2 E[X(t) 2 ]{rit) + (a(f) - r{t))n(t)}dt + E[X(t)] ( a(t ) - c(t))dt 

+ E[X(t) 2 ]a(t) 2 n(t) 2 dt. 

(7) 


Hence, the expectation function of Y solves the differential equation 

dE[Y(t)] = E[X(t )] \r(t) + ( a(t ) — r(t))n(t)]dt + (a(t) — c(t))dt 

- yE[X(t) 2 ]a(t) 2 n(t) 2 dt - 2 y(£[X(t) 2 ] - £[X(t)] 2 ) (8) 

x { r(t ) + (or(t) — r(t))n'(t)}dt. 

The right hand side of (8) is maximal with respect to :r(r) for 

1 a(t) — r(t) E[X(t)] — 2y Var[X(f)] 

= ' <9> 

Plugging (9) into (8) and rearranging terms yields 

d E[Y(t)] < r(t)(E[X(t )] - 2y Var[X(t)])dt + ( a(t ) - c(t))dt 

1 (a(t)-r(t)) 2 (£[X(r)]-2 K Var[X(0]) 2 df (10) 

+ 4y o (t) 2 ^IXCO 2 ] 


Recall that we assumed y > 0 and o ( t ) > 0, so the first and second denominator are 
never zero. If the third denominator E\X ( t ) 2 ] is zero, we implicitly get E[X (Y)] = 0, 
and (10) is still true by defining 0/0 := 0. The first line on the right hand side of 
(10) has an upper bound of 


r(t) E[Y(t)]dt + (a(t) - c(t))dt. 
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With the help of the equality 

(£[*«] - 2y Var[X(f)]) 2 = E[X(t)] 2 - 4y Var[X(r)] E[Y(t)] 

and the inequalities ( E[X(t )]) 2 < E[X(t ) 2 ] and Var[X(0] < E[X(t) 2 ], we can 
show that the second line on the right hand side of (10) has an upper bound of 


All in all, we obtain 

dE[Y(t)]<(Ci\E[Y(t)]\+C 2 )dt, t e [0, T] (11) 

for some finite positive constants C\ and C2, since the functions r(t), a(t), a(t) are 
uniformly bounded on [0, T\, since —c(t) < —c(t) for a uniformly bounded function 
c, and since the positive and continuous function cr(t) has a uniform lower bound 
greater than zero. Thus, we have E[Y (t) ] < g(t ) for g(t ) defined by the differential 
equation 


d g(t) = (Ci |£(f)| + C 2 )dt, g( 0) = 7(0) = x 0 > 0. (12) 

This differential equation for g(t) has a unique solution, which is bounded on [0, T] 
and does not depend on the choice of (tv, c ). Hence, also MV y [X(T)] = E[Y (T)] 
has a finite upper bound that does not depend on the choice of (n,c). The same is 
true for the functional (4), since 

T 

Gin, c)< J e~ ps c(s)ds + e~ pT MV ye - pT [X(C)] . 

0 

Now we show the continuity of the functional G. Suppose that (7 r n , c n ) n >\ is 
an arbitrary but fixed sequence in D that converges to (7To, co) with respect to the 
supremum norm. Since D is a Banach space, the limit (7To, co) is also an element of 
D. Let X n (t ) := X^ n,Cn \t) for all t. As the sequence (n n , c n ) n > 1 is convergent and 
within D , the absolutes \n n (t)\ and \c n (t)\ have finite upper bounds, uniformly in n 
and uniformly in t. Therefore, analogously to inequality (11), from Eq. (6) we get 
that 


dE[X n (t)] < (C 3 \E[X n (t)]\ + C 4 )dt, t € [0, T], n = 0, 1, 2, . . . 

for some positive finite constants C3 and C4. Arguing analogously to (12), we obtain 
that E[X n (t) ] < fit) for some bounded function f(t). Using similar arguments for 
—E[X n (t)], we get that also the absolute \E[X n (t)] \ is uniformly bounded in n and 
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in t. Applying Eq. (7), we obtain 

d E[X n (t) 2 ] = 2 E[X n {t) 2 ][r{t) + (a(t) - r(t))n n (t)}dt 

+ E[X n (t)] ( a(t ) - c n (t))dt + E[X n (t) 2 ]a(t) 2 n n (t) 2 dt. 

Using the uniform boundedness of \E[X n (t)]\, \7t n (t)\ and \c n (t)\, we can conclude 
that 


dE[X n (t) 2 ] < (C 5 E[X n (t) 2 ] + C 6 )dt, t e [0, T], n =0,1,2,... 

for some positive finite constants C5 and C 6. Hence, arguing analogously to above, 
the value E[X n (t ) 2 ] is uniformly bounded in ft and in C Let 7^(7) be the process 
according to definition (5) but with X n instead of X. Using (8) and the uniform 
boundedness of \E[X n (t )]\ , E[X n (t) 2 ], \7t n (t)\ and |c w (0l, we can show that 

d E[Y 0 (t) - Y n (t)] < (c 7 sup |tt 0 (O - 7t n (t)\ + sup |c 0 (0 - c n (t)\ Jdr, t e [0, T] 

\ te[0,T] te[0,T] / 

for some positive finite constant C7. Thus, we get 

E[Yq(T) - Y n (T )] < T C 7 sup 1 7T 0 (0 - 7V n (t)\ + T sup \c 0 (t) - c n (t) |, 

te[0,r] te[0,r] 

where we used that 7o(0) — Ui(0) = vo — vo = 0- Arguing similarly for — E[Yo(t) — 
Y n (t)], we can conclude that 


|G(7r 0 ,c 0 ) - G(7t n ,c n )\ 

T 

= J e~P s (c 0 (s) - c n (s))ds + e-P T MV ye - P r [X 0 (D] - e^ 7 MV ye - P r [X n (T)} 
0 

<T sup \co(t)-c n (t)\+e- pT \E[Y 0 (T)-Y n (T)]\ 

te[0,T] 

< T C 8 sup \i r 0 (0 - 7r n (0l + 27 sup |c 0 (0 - c n (0l 
?e[0,T] te[0,T] 


for some finite constant Cs, where the processes To (7) and Y n (t) are defined as 
above but with y replaced by ye~ pT . Since we assumed that ( 7t n , c n ) n > 1 converges in 
supremum norm, we obtain that G{n n , co) converges to G(7ro, co), i- e - the functional 
G is continuous. 


A Variational Approach for Mean- Variance-Optimal Deterministic Consumption . . . 


231 


As G has a finite upper bound on the domain D , the supremum 

sup G(tt, c) 

( 7 T,c)eD 

indeed exists. Since G is continuous and D is a Banach space, we can conclude that 
on each compact subset K of D there exists a pair (7T*, c*) for which 

G(tt*,c*)= sup G(tt,c). (13) 

(: 7t,c)eK 


4 A Pontryagin Maximum Principle 

Christiansen and Steffensen [9] identify characterizing equations for optimal invest- 
ment and consumption rate by using a Hamilton- Jacobi-Bellman approach. Here, we 
show that those characterizing equations are indeed necessary by using a Pontryagin 
maximum principle (cf. Bertsekas [4]). 

Defining the moment functions 

mi(t) = E[(X(t)Yl i = 1,2, 

Pi(t) = E[(tf (a(s)-c(s))e£ du dsy], i = 1,2, 

\t ... (14) 

ni(t) = E[e‘J> du ], i = 1,2, 

k(t) = E[ef> T du f t T (a(s) - c(s))efs T du ds], 

as in Christiansen and Steffensen [9], we can represent the objective function G(ir, c ) 
by 


T 

G(jt, c) — f e~ ps c(s)ds + e~ pT (m\(t)n\(t) + p\(t)) 

o 

—ye~ 2pT (i m 2 (t)n 2 (t ) + 2m\(t)k(t) + p 2 (t ) - (m\(t)ni(t) + pi(t)) 2 ) 

(15) 


for any t in [0, T]. Simple calculations give us that 

= (r(t) + (a(t) - + (a(t) - c(t)), 

£ m 2(0 = (2 r(t) + 2 (a(t) - + 7t(t) 2 cr(t) 2 ^ rri 2 (t) + 2 (a(t) - c{t))m\(t). 

(16) 

Similarly to m\ and m 2 , also n\ , 112 , pi, P 2 , and k solve a system of ordinary 
differential equations but with terminal instead of initial conditions, see Christiansen 
and Steffensen [9]. 
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Theorem 2 Let (tv*, c*) be an optimal control in the sense of (13), and let m*(t), 
p*(t), n*(t), i = 1,2, and k*(t) be the corresponding moment functions according 
to (14). Then, we have necessarily 


7T*(0 


a(t) - r(t ) ^ e pT - 2 ym*(t)(k*(t) - n\(t)m\(T)) ^ 

ct ( 0 2 \ 2 ym |( 0«|(0 / 


c*(t) = 


(17) 

c(t) if e p (r_f) - n\(t) + 2ye- pT + k*(t) - )) > 0 

eft) else. 

(IB) 


Proof With ( 7 r*, c*) being an optimal control, we define local alternatives by 


C 7T s (t),c e (t )) 


(jt*(t), C*(0) + (h(t), l {t)) for t € (to — 6, to] 
(7 T*(t),c*(t)) else 


for continuous functions h and /. As G( 7T*, c*) is maximal, by applying (15) for 
t = to we obtain that 


to 

G(7r*, c*) — G(jr e , c s ) = — J e~ ps l(s)ds + e~ pT (m\(to) — 

to~8 

- ye~ 2pT ^(ml(to) - m s 2 (to))n* 2 (to) + 2(m\(to) 

- (m*fio) 2 - mf fi 0 ) 2 )^tfio) 2 - 2(m*(fo) 
- m\(to))n\(to)p\(to )} 

*D 

= - [ e~ ps l(s)ds + (ra*fi 0 ) - m\ (to)) 

to-e 

x {e _pr rc*fio) - 2ye~ 2pT (k*(to) - 

- (m*fi 0 ) 2 - m\(to) 2 )ye~ 2pT n\(to) 2 

- (m^fio) -^ 2 fio))y^ _2p:r ^ 2 ( r o) 


must be nonnegative. Equation (16) implies that 


m 


* (to)-m\(t 0 )= 

tQ-e 
to 

J ^|(r(s) + (a(s) - r(s»jr*(s))m*(s) 


(19) 
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(; r(s ) + (aO) - r(^))7r e (^))m^(^) J + l(s)^jds, 


since || m\ — m\ || — >► 0 for s — >► 0. Moreover, since we have m*(t) — > m*(fo), 
r(0 — >► r(^o), a(f) — > a (fa), cr(f) — > tx(fa) for £ -> fa, we get that 


to t 0 

m*(to) - mf(fa) = -(a (to) - r(fa))m*(fa) j h(s)ds + J l(s)ds + o(e). 

to-e to-e 

( 20 ) 


For the squared functions we use 

m\ (t 0 ) 2 - m\ (to) 2 = (m* (to) m\ (fa)) (m* (fa) + m\ (t 0 )) 

= 2m* (to) (m* (to) - (fa)) ~ (m*(fa) - mf (fa))" 

and then apply the asymptotic formula (20), which leads to 

to 


m\(to) 2 - m\(to) 2 = -2(a(to) - r (t 0 )) m\(to) 2 J h(s)ds 

to-e 
to 

+ 2m*(fa) J l(s)ds+o(e). 


to-e 


Similarly, we can show that 

m|Oo) - invito) = { - 2(a(/o) - r(to)) m|(fo) 


■ 2 n 


'(to)(x(to) 2 ml(to)\ f h(s)ds 

1 t 0 -e 


+ 2m*(to) f l(s)ds+o(s). 

to-e 

Plugging Eq. (21) into Eq. (19) and rearranging, we get 

to 

o(e) < J l(s)ds(^-e pt ° +e~ pT n\{tQ)-2ye~ 2pT (m\{to)n\{to) + k*{tQ) 
- n* l (to){n* l (to)m*(to) + P*Oo))}) 


to-e 


( 21 ) 
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+ / m d, («(„) - K.»»( - 

to-e 

x ( -k*(to) +nt(fo)KObKao) + pJOb))}) 

- 2ye~ 2pT ml(to)nl(to)\ - 1 - 7 r *( f o) ft \ 1 

l a(f 0 ) - r(to) J 

for all continuous functions l and h. Note that n\(to)m\(to) + p*(to) = m*(T). 
Consequently, we must have that the sign of l (to) equals the sign of 

-e pt ° + e~ pT n\(to) - 2ye~ 2pT {m\(to)nl(to) + k*(to) - n\(to)m*(T)^, 

which means that (18) holds, and we have necessarily that 

0 = -m\(to)n\(to)e~ pT - 2ye~ 2pT ml(to)nl(to)[ - 1 - ^*(fo)^yz^y) 
—2ye~ 2pT m*(to)^ - k*(t 0 ) + n\(to)m\(T)) , 

( 22 ) 

which means that (17) is satisfied. 

Recalling that n\(tf)m\{tf) + p*(to) = m\(T), we observe that Eqs.(17) and (18) 
are equal to Eqs. (19) and (20) in Christiansen and Steffensen [9], which means that 
the latter equations are not only sufficient but also necessary. 


5 Generalized Gradients for the Objective 


For differentiable functions on the Euclidean space, a popular method to find maxima 
is to use the gradient ascent method. We want to follow that variational concept, 
however our objective is a mapping on a functional space. Therefore, we first need 
to discuss the definition and calculation of proper gradient functions. 

Theorem 3 Let (7T, c) e D for D as defined in Theorem 1. For each pair of contin- 
uous functions (h, l) on [0, T\ we have 


lim 

0 


G ( it + 8h, c + 81) — G(tv, c) 
8 


i 

-s 

0 


h(s)(V 7T G(7T 7 c))(s)ds 


1 

+ j 1 (s)(V c G(tt , c))(s)ds 
0 
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with 


(V n G{n, c))0) = (a (.?) - r(s))hni(s)ni(s)e pT - lye lpT m 2 (s)ri 2 (s) 


X (l + 7T(s) a - 1 - 2 Y 

\ ays) — rys)J 

x (k(s) - 


o-2pT 


m\{s) 


and 


(V C G(. TV, c))(s) = e ps - e~ pT ni(s ) 

+ lye~ 2pT {m\{s)ri 2 (s) + k(s) - n\{s)m\{T)^. 

The limit 

G(jt + 8h, c + 81) — G(jt, c) d 


lim 

< 5^0 


d8 


8=0 


G(jt T- 8h , c 81) 


is the so-called Gateaux derivative (or directional derivative) of the functional 
G at (7T, c) in direction (h, l). Following Christiansen [7], we interpret the two- 
dimensional function (V n G(7t, c), W n G(7t, c)) as the gradient of G at (tv, c). 

Proof {Proof of Theorem 3) In the proof of Theorem 2 we already implicitly showed 
that 

to 

G(n, c ) - G(n + hl( to - EJo] , c + Zl(, 0 _ £ ,, o] ) = -(V„G(7i r, c))(f 0 ) / h(s)ds 

to-e 

to 

— (V c G(jt, c))(to) j l(s)ds + o(e) 

to-s 

for all to G [0, T], (tv, c) e D , and h, l e C([0, T]). Defining an equidistant decom- 
position of the interval [0, T] by 


i i := — T, i = 0, . . . , n, 
n 
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we can rewrite the difference G(jt + 8h, c + 81) — G(n, c) to 


G(n +8h,c + 8l) - G(tt, c) 


n 

= ^ ( Gin + c + S/l[o,r,]) - G(n + Shl[ o, Ti _i], c + <$/1[o,t,_i])) 

i = 1 

« T i 

= <5 ^(V^GOr + SAl[0,Ti_i], c + <Wl[0,Tj_i]))(Ti) I Ms) As 

1 = 1 Ti - 1 

r ; 

« in 

+ ^ ^^(V c G(7r + c + 5/l[o, r/ _i]))(^) / /CsQds + oij jn) 

„• 1 J : 1 


Z = 1 


* 1-1 


Z = 1 


for all 0 < 6 < 1. The moments pi, P 2 ,n\,ri 2 ,k, interpreted as mappings of (7T, c) 
from the domain Z?([0, T]) 2 with L 2 -norm into the codomain C([0, T]) with supre- 
mum norm, are continuous. Hence, the gradient functions on the right hand side of 
the last equation are continuous with respect to the parameters r*_i and T; . Thus, for 
n — > 00 we obtain 

T 

G(tt + 8h, c + 81) - G(tt, c) [ 

— = j (V^G(tt + 8hl mh c + 8ll [0 ,s]))(s)h(s)ds 

0 

T 

+ J (V c G( 7T + 5/zl[o, 5 ], c + <$/l[o,s]))Cs')/CsOds'. 

0 

Since the moment functions p\, P 2 ,n\,ri 2 ,k (interpreted as mappings of (n , c) 
from the domain #([0, T ]) 2 with supremum-norm into the codomain C([0, T ]) with 
supremum norm) are even uniformly continuous, the above gradient functions are 
uniformly continuous with respect to parameter 8. Thus, for 8 0 we end up with 

the statement of the theorem. 


6 Numerical Optimization by a Gradient Ascent Method 

With the help of the gradient function ( V n G(n , c), V n G(n, c)) of the objective 
G(7T, c), we can construct a gradient ascent method. A similar approach is also used 
in Christiansen [8]. 

Algorithm 

1. Choose a starting control c^). 

2. Calculate a new scenario by using the iteration 
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Fig. 1 Sequence of 
investment rates Jt^ l \ 
i = 0, . . . , 40 calculated by 
the gradient ascent method. 
The higher the number i the 
darker the color of the 
corresponding graph 



(jr (i+1) , c°' +1) ) := (jr (i) , c (0 ) + K h n G(n (i) , c (0 ), V W G(^ (0 , c (0 )l 

where K > 0 is some step size that has to be chosen. If c^ +1 ) is above or below 
the bounds c and c, we cut it off at the bounds. 

3. Repeat step 2 until |G(7r^ +1 \ c ( ^)| is below some error tol- 

erance. 


7 Numerical Example 

Here, we demonstrate the gradient ascent method of the previous section with a 
numerical example. For simplicity, we fix the consumption rate c and only control 
the investment rate n . We take the same parameters as in Christiansen and Steffensen 
[9] in order to have comparable results: For the Black-Scholes market we assume 
that r = 0.04, a = 0.06 and a = 0.2. The time horizon is set to T = 20, the initial 
wealth is xo = 200, and the savings rate is a(t) — c(t) = 100 — 80 = 20. The 
preference parameter of consuming today instead tomorrow is set to p = 0.1, and 
the risk aversion parameter is set to y — 0.003. 

Starting from = 0.5, Fig. 1 shows the converging series of investment rates 
7 t9), i — o, . . . , 40 for K = 0.2. The last iteration step 7r (4 °) perfectly fits the 
corresponding numerical result in Christiansen and Steffensen [9]. 
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Risk Control in Asset Management: Motives 
and Concepts 


Thomas Dangl, Otto Randl and Josef Zechner 


Abstract In traditional portfolio theory, risk management is limited to the choice 
of the relative weights of the riskless asset and a diversified basket of risky secu- 
rities, respectively. Yet in industry, risk management represents a central aspect of 
asset management, with distinct responsibilities and organizational structures. We 
identify frictions that lead to increased importance of risk management and describe 
three major challenges to be met by the risk manager. First, we derive a frame- 
work to determine a portfolio position’s marginal risk contribution and to decide on 
optimal portfolio weights of active managers. Second, we survey methods to con- 
trol downside risk and unwanted risks since investors frequently have nonstandard 
preferences, which make them seek protection against excessive losses. Third, we 
point out that quantitative portfolio management usually requires the selection and 
parametrization of stylized models of financial markets. We, therefore, discuss risk 
management approaches to deal with parameter uncertainty, such as shrinkage pro- 
cedures or resampling procedures, and techniques of dealing with model uncertainty 
via methods of Bayesian model averaging. 
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1 Introduction 


In traditional portfolio theory the scope for risk management is limited. Wilson [63] 
showed that in the absence of frictions the consumption allocation of each agent in an 
efficient equilibrium satisfies a linear sharing rule as long as agents have equi-cautious 
HARA utilities. This implies that investors are indifferent between the universe of 
securities and having access to only two appropriately defined portfolio positions, a 
result that is usually referred to as the Two-Fund Separation Theorem. If a riskless 
asset exists, then these two portfolios can be identified as the riskless asset and the 
tangency portfolio. Risk management in this traditional portfolio theory is, therefore, 
trivial: the portfolio manager only needs to choose the optimal location on the line 
that combines the riskless asset with the tangency portfolio, i.e., on the capital market 
line. Risk management is thus equivalent to choosing the relative weights that should 
be given to the tangency portfolio and to the riskless asset, respectively. 

In a more realistic model that allows for frictions, risk management in asset man- 
agement becomes a much more central and complex component of asset manage- 
ment. First, a world with costly information acquisition will feature informational 
asymmetries regarding the return moments, as analyzed in the seminal paper by 
Grossman and Stiglitz [29]. In this setup, investors generally do not hold the same 
portfolio of risky assets and the two fund separation theorem brakes down (see, 
e.g., Admati [1]). We will refer to such portfolios as active portfolios. In such a 
setup, risk management differs from the simple structure described above for the 
traditional portfolio theory. Second, frictions such as costly information acquisition 
frequently require delegated portfolio management, whereby an investor transfers 
decision power to a portfolio manager. This gives rise to principal-agent conflicts 
that may be mitigated by risk monitoring and portfolio risk control. Third, investors 
may have nonstandard objective functions. For example, the investor may exhibit 
large costs if the end-of-period portfolio value falls below a critical level. This may 
be the case, for example, because investors are subject to their own principal-agent 
conflicts. Alternatively, investors may be faced with model risk, and thus be unable 
to derive probability distributions over possible portfolio outcomes. In such a setting 
investors may have nonstandard preferences, such as ambiguity aversion. We will 
now discuss each of these deviations from the classical frictionless paradigm and 
analyze how it affects portfolio risk management. 


2 Risk Management for Active Portfolios 

If the optimal portfolio differs from the market portfolio, portfolio risk management 
becomes a much more complicated and important task for the portfolio manager. For 
active portfolios individual positions’ risk contributions are no longer fully deter- 
mined by their exposures to systematic risk factors that affect the overall market 
portfolio. A position’s contribution to overall portfolio risk must not be measured 
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by the sensitivity to the systematic risk factors, but instead by the sensitivity to the 
investor’s portfolio return. For active portfolios the manager must, therefore, cor- 
rectly measure each asset’s risk-contribution to the overall portfolio risk and ensure 
that it corresponds to the expected return contribution of the asset. We will now 
derive a simple framework that a portfolio manager may use to achieve this. 

We consider an investor who wishes to maximize his expected utility, E[u]. In 
this section, we consider the case where the investor exhibits constant absolute risk 
aversion with the coefficient of absolute risk aversion denoted by r. In the following 
derivations, we borrow ideas from Sharpe [61] and assume for convenience that 
investment returns and their dispersions are small relative to initial wealth, Vo. Thus, 
we can approximate r ~ 7/ Vo with 7 denoting the investor’s relative risk aversion. 
This allows for easy translation of the results into the context of later sections, where 
we focus on relative risk aversion. 1 An expected-utility maximizer with constant 
absolute risk aversion solves 


ma xE[u] = max E[— exp(— A(Vo(l + u/r)))] = max E[— exp(— 7(1 + u/r))], 

w w w 

( 1 ) 

where w represents the (Ax 1) vector of portfolio weights and r is the (Ax 1) vector 
of securities returns. We make standard assumptions of mean-variance analysis, 
and denote fi e as the (A x 1) vector of securities’ expected returns in excess of 
the risk free rate ry, cr 2 (w) the portfolio’s return variance given weights w , and 
£ the covariance matrix of excess returns. MR = 2 £u; constitutes the vector of 
marginal risk contributions resulting from a marginal increase in portfolio weight of 
the respective asset, i.e., MR = d(cr 2 )/dw, financed against the riskless asset. For 
each asset i in the portfolio we must, therefore, have 

( 2 ) 


which implies 


MR i 


E) _ 1 

MR, “ 2 7 


Vi, ./'• 


( 3 ) 


These results show the fundamental difference between risk management for 
active and passive portfolios. While in the traditional world of portfolio theory, each 
asset’s risk contribution was easily measured by a constant (vector of) beta coef- 
ficients) to the systematic risk factor(s), the active investor must measure a secu- 
rity’s risk contribution by the sensitivity of the asset to the specific portfolio return, 
expressed by 2^ £ w. This expression makes clear that each position’s marginal risk 
contribution depends not only on the covariance matrix £ , but also on the portfolio 
weights, i.e., the chosen vector w. It actually converges to the portfolio variance, 
cr 2 , as the security’s weight approaches one. In the case of active portfolios, these 
weights are likely to change over time, and so will each position’s marginal risk 


1 See, e.g., Pennacchi [55] for more details on this assumption. 
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contribution. The portfolio manager can no longer observe a position’s relevant risk 
characteristics from readily available data providers such as the stock’s beta reported 
by Bloomberg, but must calculate the marginal risk contributions based on the port- 
folio characteristics. As shown in Eq. (3), a major responsibility of the portfolio risk 
manager now is to ensure that the ratios of securities’ expected excess returns over 
their marginal risk contribution are equated. 


2.1 Factor Structure and Portfolio Risk 


A prevalent model of investment management in practice features a CIO who decides 
on the portfolio’s asset allocation and on the allocation between passively or actively 
managed mandates within each asset class. The actual management of the positions 
within each asset class is then delegated to external managers. In the following we 
provide a consistent framework within which such a problem can be analyzed. We 
hereby assume a linear return generating process so that the vector of asset excess 
returns, r e can be written as 

r e = a + Bf e + e, (4) 


where 

• r e is the (Ax 1) vector of fund or manager returns in excess of the risk free return 

• B is a (N x K) matrix that denotes the exposure of each of the N assets to the K 
return factors 

• f e is a (K x 1) vector of factor excess returns and 

• e is the error term (independent of f e ). 

Let £ f denote the covariance matrix of factor excess returns and £2 the covariance 
matrix of residuals, e. Then, the covariance matrix of managers’ excess returns £ is 
given by 

E = E(r e r e ') 

= E([Bf e + e][Bf e + e]') 

= E(Bf e f'B' + ee') 

= BE(f e f e ')B' + E(ee') 

= BUfB' + Q. 

Let w denote the N x 1 vector of weights assigned to managers by the CIO, then 
the portfolio excess return r e p is given by 
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If ei is the i th column of the (N x N ) identity matrix then 

Cov(r?, rp = Co \(e[Bf e + e' t e, w' Bf e + w'e) 

= Co v(e\Bf e , w'Bf e ) + Cov(£-e, w'e) 
= e'iBUfB'w + 


The beta of manager V s return with respect to the portfolio is then 


A- = 


e^BYifB'w + e' ( Q w 
w'iBTifB' + Q)w 


Thus, we have an orthogonal decomposition of the vector of betas, (3, into a part 
that is due to factor exposure, (3 s , and a part that is due to the residuals of active 
managers (tracking error), /r 

~ B'EfB'w + Qw BUfB'w Qw 

^ ~ w'iBllfB' + Q)w ~ + ojw + w^BY^B’ + £l)w ' 

^ v ' ^ v ' 

ft 

We can now determine the beta of a pure factor excess return to the portfolio. 
With e F denoting the kth column of the ( K x K) identity matrix, the covariance 
between the factor excess return and the portfolio excess return is 

Covt#, rp = Co v(ef '/ e , w'Bf e + w'e) 

= 

The vector of pure factor betas, /r , to the portfolio is therefore 
~ F YifB'w 

(3 F = 

w'(BX f B' + Q)w 

We thus have (3 s = B(3 F . Consequently, a position’s beta to the portfolio can be 
written as 

/3 = Bp F + (3 1 . 


i.e., we can decompose the position’s beta into the exposure- weighted betas of the 
pure factor returns plus the beta of the position’s residual return. 

Next we can derive the vector of marginal risk contributions of the portfolio 
positions. Given the factor structure above, the effect of a small change in portfolio 
weights, w , on portfolio risk, a 2 is given by MR: 
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1 1 <9 . 

-MR = - - — w YjW = Y w 

2 2 dw 


= a 2 p (3 = a 2 p (B/3 F + /3 1 ) 


Thus, an individual portfolio position i ’s marginal risk contribution, MR/ , is given 
by 


2.2 Allocation to Active and Passive Funds 

One important objective of risk control in a world with active investment strategies 
is to ensure that an active portfolio manager’s contribution to the portfolio return 
justifies his idiosyncratic risk or “tracking error”. If this is not the case, then it is 
better to replace the active manager with a passive position that only provides a pure 
factor exposures but no idiosyncratic risks. To analyze this question we define v e 
as the vector of expected excess returns of the factor-portfolios and assume without 
loss of generality v e > 0. Then, the vector of expected portfolio excess returns can 
be written as 


The first order optimality condition (3) states that the portfolio weight assigned to 
manager i should not be reduced as long as it holds that: 


Substituting marginal risk contribution from (5) and expected return from (6) into 
the above relation, we conclude that a manager i with MR/ > 0 justifies her portfolio 
weight relative to a pure factor investment in factor k iff 


Consider the case where asset manager i has exposure only to factor k , denoted 
by Bi k. Then, this manager justifies her capital allocation iff 



= e) X w = c T 2 p (3i 

= c 2 p( B Pf + A 7 )- 


(5) 


// = E(a + Bf e +e) = a + Bu e . 


( 6 ) 
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B Uk v k + > + P[ 

*4 ~ ~P[ 

B i,k H e > B i,k + A 

V k Pk 

a, > IjU'- 

/f 


Note that in general this condition depends on the portfolio weight. For sufficiently 
small weights wi, manager V s tracking error risk will be “non- systematic” in the 
portfolio context, i.e., f3j = 0 . However, as manager V s weight in the portfolio 
increases, his tracking error becomes “systematic” in the portfolio context. Therefore, 
the manager’s hurdle rate increases with the portfolio weight. This is illustrated in 
Example 1 . 

Example 1 Consider the special case where there is only one single factor and a 
portfolio, which consists of a passive factor-investment and a single active fund. The 
portfolio weight of the passive investment is denoted by w\ and that of the active 
fund by W2. The active fund is assumed to have a beta with respect to the factor 
denoted by /3 and idiosyncratic volatility of 07. 2 

The covariance of factor returns is then a simple scalar equal to the factor return 
variance, the matrix of factor exposures B has dimension ( 2 x 1 ) and the idiosyncratic 
covariance matrix is (2 x 2) 




Q = 



The usual assumption v e > 0 , aj > 0 applies. The hurdle to be met by the alpha of 
the active fund is accordingly given by 


PL, 


<?]w 2 


a > H (w i) = = 

(3 F crl{\ - (1 - p)iv 2 ) 


V . 


The derivative of this hurdle with respect to the weight of the active fund W2 is 

^2 


dH 

dW2 


V 


1 


0-2 (1 - (1 - f3)w 2 ) 2 


> 0, 


i.e., the hurdle H(w2) has a strictly positive slope, thus, the higher the portfolio weight 
W2 of an active fund, the higher is the required a it must deliver. This is so because with 
low portfolio weight, the active fund’s idiosyncratic volatility is almost orthogonal to 


2 Note that f3 is the linear exposure of the fund to the factor. It is a constant and independent of 
portfolio weights. In contrast, betas of portfolio constituents relative to the portfolio, and [3 1 , 
depend on weights. 
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Fund Alpha and Optimal Allocation 
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Fig. 1 Minimum alpha justifying portfolio weights 

the portfolio return, and so its contribution to the overall portfolio risk is low. When in 
contrast the active fund has a high portfolio weight, its idiosyncratic volatility already 
co-determines the portfolio return and is — in the portfolio’s context — a systematic 
component. The marginal risk contribution of the fund is then larger and consequently 
demands a higher compensation, translating into an upward- sloping a-hurdle. 

Take as an example JPMorgan Funds — Highbridge US STEEP, an open-end fund 
incorporated in Luxembourg that has exposure primarily to U.S. companies, through 
the use of derivatives. Using monthly data from 12/2008 to 12/2013 (data source: 
Bloomberg), we estimate 


Furthermore, we use the historical average of the market risk premium v = 0.013127, 
and the fund’s estimated alpha a = 0.001751. The optimal allocation is the vector 
of weights w* such that the marginal excess return divided by the marginal risk 
contribution is equal for both assets in the portfolio. The increasing relationship 
between alpha and optimal fund weight is illustrated in Fig. 1 . At the estimated alpha 
of 17.51 basis points, the optimal weights are given by 
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3 Dealing with Investors Downside-Risk Aversion 


When discussing investor’s utility optimization in Sect. 2, we referred to literature 
showing that under fairly general assumptions optimal static sharing rules are linear 
in the investment’s payoff, i.e., optimal risk sharing implies holding a certain fraction 
of a risky investment rather than negotiating contracts with nonlinear payoffs. In a 
dynamic context, Merton [51] derives an optimal savings-consumption rule that is 
also in accordance with this finding. Consider a continuous -time framework with a 
single risky and a riskless asset, where the investor can change the allocation w t to 
the risky asset over time. When the risky asset follows a geometric Brownian motion 
with drift fi and volatility a, and utility exhibits a constant relative risk aversion 
7 , then the optimal allocation to the risky asset is constant over time and can be 
described as w t = ///( 7 a 2 ). This means with constant investment opportunities 
(/i and a constant over time) investors keep the proportions of the risky and risk- 
free assets in the portfolio unchanged over time. To keep weights constant, portfolio 
rebalancing requires buying the risky asset when it decreases in value and selling it 
with increasing prices. 

While these theoretical results suggest that an investor should not avoid exposure 
to risky investments even after sharp draw-downs of her portfolio’s value, financial 
intermediaries face strong demand for products that provide portfolio insurance. 
That is, investors seem to have considerable downside-risk aversion. Rebalancing to 
constant portfolio weights is in clear contrast to portfolio insurance strategies, where 
the allocation to the risky asset has to be decreased if it falls in value, and the risky 
asset will be purchased in response to price increases. Perold and Sharpe [56] note that 
these opposing rebalancing rules lead to different shapes of strategy payoff curves. 
Buying stocks as they fall (as in the Merton model) leads to concave payoff curves. 
Such strategies do well in flat but oscillating markets, as assets are bought cheaply 
and sold at higher prices. However, in persistent downmarkets losses are aggravated 
from buying ever more stocks as they fall. Portfolio insurance rebalancing rules 
prescribe the opposite: selling stocks as they fall. This limits the impact of persistent 
down markets on the final portfolio value and at the same time keeps the potential of 
upmarkets intact, leading to a convex payoff profile. Yet if markets turn out flat but 
oscillating, convex strategies perform poorly. 


3.1 Portfolio Insurance 

In this paper, we define portfolio insurance as a dynamic investment strategy that is 
designed to limit downside risk. The variants of portfolio insurance are, therefore, 
popular examples of convex strategies. The widespread use of portfolio insurance 
strategies among both individual and institutional investors indicates that not all 
market participants are equally capable of bearing the downside risk associated with 
their average holding of risky assets. Individual investors might be subject to habit 
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formation or recognition of subsistence levels that define a minimum level of wealth 
required. For corporations, limited debt capacity makes it impossible to benefit from 
profitable investment projects if wealth falls below a critical value. Furthermore, 
kinks in the utility function could originate in agency problems, e.g., career concerns 
of portfolio managers, who see fund flows and pay respond in an asymmetric way 
to performance. In the literature on portfolio insurance, Leland [47] has stated the 
prevalence of convex over concave strategies for an investor whose risk aversion 
decreases in wealth more rapidly than for the representative agent. Alternatively, 
portfolio insurance strategies should be demanded by investors with average risk 
tolerance, but above average return expectations. Leland argues that insured strate- 
gies allow such an optimistic investor to more fully exploit positive alpha situations 
through greater levels of risky investment, while still keeping risk within manageable 
bounds. 

Brennan and Solanki [14] contrast this analysis and derive a formal condition for 
optimality of an option like payoff that is typical for portfolio insurance. It can be 
shown that a payoff function where the investor receives the maximum of the refer- 
ence portfolio’s value and a guaranteed amount is optimal only under the stringent 
conditions of a zero risk premium and linear utility for wealth levels in excess of the 
guaranteed amount. Similarly, Benninga and Blume [9] argue that in complete mar- 
kets utility functions consistent with optimality of portfolio insurance would have 
to exhibit unrealistic features, like unbounded risk aversion at some wealth level. 
However, they make the point that portfolio insurance can be optimal if markets are 
not complete. An extreme example of market incompleteness in this context, which 
makes portfolio insurance attractive, is the impossibility for an investor to allocate 
funds into the risk-free asset. Grossman and Vila [30] discuss portfolio insurance 
in complete markets, noting that the solution of an investor’s constrained portfolio 
optimization problem (subject to a minimum wealth constraint Vj > K) can be 
characterized by the solution of the unconstrained problem plus a put option with 
exercise price K. More recently, Dichtl and Drobetz [19] provide empirical evidence 
that portfolio insurance is consistent with prospect theory, introduced by Kahneman 
and Tversky [41]. Loss-averse investors seem to use a reference point to evaluate 
portfolio gains and losses. They experience an asymmetric response to increasing 
versus decreasing wealth, in being more sensitive to losses than to gains. In addition, 
risk aversion also depends on the current wealth level relative to the reference point. 
The model by Gomes [27] shows that the optimal dynamic strategy followed by 
loss-averse investors can be consistent with portfolio insurance. 3 


3 It is interesting to study the potential effects of portfolio insurance on the aggregate market. As 
our focus is the perspective of a risk-manager who does not take into account such market-wide 
effects of his actions, we do not cover this literature. We refer the interested reader to Leland and 
Rubinstein [46], Brennan and Schwartz [13], Grossman and Zhou [32] and Basak [6] as a starting 
point. 
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3.2 Popular Portfolio Insurance Strategies 

The main portfolio insurance strategies used in practice are stop-loss strategies, 
option-based portfolio insurance, constant proportion protfolio insurance, ratcheting 
strategies with adjustments to the minimum wealth target, and value-at-risk based 
portfolio insurance. 


3.2.1 Stop-Loss Strategies 

The simplest dynamic strategy for an investor to limit downside risk is to protect 
his investment using a stop-loss strategy. In this case, the investor sets a minimum 
wealth target or floor Ft, that must be exceeded by the portfolio value Vj at the 
investment horizon T . He then monitors if the current value of the portfolio V t 
exceeds the present value of the floor exp(— r/(T — where r/ is the riskless 

rate of interest. When the portfolio value reaches the present value of the floor, the 
investor sells the risky and buys the riskfree asset. While this strategy has the benefit 
of simplicity, there are several disadvantages. First, due to discreteness of trading or 
illiquidity of assets, the transaction price might be undesirably far below the price 
triggering portfolio reallocation. Second, once the allocation has switched into the 
riskfree asset the portfolio will grow deterministically at the riskfree rate, making 
it impossible to even partially participate in a possible recovery in the price of the 
risky asset. 


3.2.2 Option-Based Portfolio Insurance 

Brennan and Schwartz [12] and Leland [47] describe that portfolio insurance can 
be implemented in two eqivalent ways: (1) holding the reference portfolio plus a 
put option, or (2) holding the riskfree asset plus a call option. When splitting his 
portfolio into a position So in the risky asset and Po in a protective put option at 
time t = 0, the investor has to take into account the purchase price of the option 
when setting the exercise price K , solving (So + Po(K)) • (Ft/ Vo) = K for K. The 
ratio Ft / Vo is the minimum wealth target expressed as a fraction of initial wealth. If 
such an option is available on the market it can be purchased and no further action is 
needed over the investment horizon; alternatively such an option can be synthetically 
replicated as popularized by Rubinstein and Leland [58]. Again, the risky asset will 
be bought on price increases and sold on falling prices, but in contrast to the stop- 
loss strategy, changes in the portfolio allocation will now be implemented smoothly. 
Even after a fall in the risky asset’s price there is scope to partially participate in an 
eventual recovery as long as Delta is strictly positive. Toward the end of the investment 
horizon, Delta will generally be very close to either zero or one, potentially leading 
to undesired portfolio switching if the risky asset fluctuates around the present value 
of the exercise price. 
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3.2.3 Constant Proportion Portfolio Insurance 

In order to provide a simpler alternative to the option replication approach described 
above, Black and Jones [10] propose CPPI for equity portfolios. Black and Perold 
[11] describe properties of CPPI and propose a kinked utility function for which CPPI 
is the optimal strategy. Implementation of CPPI starts with calculation of the cushion 
C t — V t — F u which is the amount by which the current portfolio value V t exceeds 
the present value of the minimum wealth target (F t = exp (—rf(T — t)) F t). Thus, 
the cushion can be interpreted as the risk capital available at time t. The exposure 
E t to the risky asset is determined as a constant multiple m of the cushion C t , 
while the remainder is invested risk free. To avoid excessive leverage, exposure will 
typically be determined subject to the constraint of a maximum leverage ratio /, hence 
E t = min {m • C t , l • V t }. If the portfolio is monitored in continuous time, the portfolio 
value at time T cannot fall below Fj . However, discrete trading in combination with 
sudden price jumps could lead to a breach of the minimum wealth target (gap risk). 


3.2.4 Ratcheting Strategies 

The portfolio insurance strategies discussed so far limit the potential shortfall from 
the start of the investment period to its end, frequently a calendar year. But investors 
may also be concerned with losing unrealized profits that have been earned within 
the year. Estep and Kritzman [23] propose a technique called TIPP (time invariant 
portfolio insurance) as a simple way of achieving (partial) protection of interim gains 
in addition to the protection offered by CPPI. Their methodology adjusts the floor F t 
used to calculate the cushion C t over time. The TIPP floor is set as the maximum of last 
period’s floor and a fraction k of the current portfolio value: F t = max(T)_i, kV t ). 
This method of ratcheting the floor up is time invariant in the sense that the notion of 
a target date T is lost. However, if the percentage protection is required with respect 
to a specific target date, the method can be easily adjusted by setting a target date floor 
Fj proportional to current portfolio value V u which is then discounted. Grossman 
and Zhou [31] provide a formal analysis of portfolio insurance with a rolling floor, 
while Brennan and Schwartz [13] characterize a complete class of time-invariant 
portfolio insurance strategies, where asset allocation is allowed to depend on current 
portfolio value, but is independent of time. 


3.2.5 Value-at-Risk-Based Portfolio Insurance 

In a broader context, Value-at-Risk (VaR) has emerged as a standard for measurement 
and management of financial market risk. VaR has to be specified with confidence 
a and horizon At and is the loss amount that will be exceeded only with probability 
(1 — a) over the time span At. It is, therefore, a natural measure to control portfolio 
drawdown risk. The typical definition of VaR assumes that over the time horizon 
no adjustments are made to the portfolio. Yet, if under adverse market movements 


Risk Control in Asset Management: Motives and Concepts 


251 


risk reducing transactions are implemented, VaR is likely to overestimate actual 
losses, making portfolio insurance even more effective. On the other hand, poor 
estimation of the return distribution will lead to bad quality of the VaR estimate. 
Herold et al. [35, 36] describe a VaR-based method for controlling shortfall risk. 
The allocation to the risky asset is chosen such that the VaR equals the prespecified 
minimum return. They note that their method can be seen as a generalized version of 
CPPI with a dynamic multiplier m t = l/(^>~ l (a)\/~Kta t ), where is the a- 

percentile of the standard normal distribution, and a t is the volatility of the reference 
portfolio. Typically, market volatility increases when markets crash, leading to a more 
pronounced reduction of the allocation to the risky asset as both the cushion and the 
multiplier shrink. This offers the potential advantage of VaR-based risk control that 
if markets calm, the allocation to the risky asset will increase again, allowing the 
portfolio to benefit from a recovery. Basak and Shapiro [7] take a critical view on 
VaR-based risk management: Strictly interpreting VaR as a risk quantile, managers 
could be inclined to deliberately assume extreme risks if they are not penalized for 
the severity of losses that occur with a probability less than l — a. However, in a 
portfolio insurance context this could be easily fixed, e.g., by restrictions on assuming 
tail risks. 


3.3 Performance Comparison 

Benninga [8] uses Monte Carlo simulation techniques to compare stop-loss, OBPI, 
and CPPI. Surprisingly, he finds that stop-loss dominates with respect to terminal 
wealth and Sharpe ratio. Dybvig [21] considers asset allocation and portfolio payouts 
in the context of endowment management. If payouts are not allowed to decrease, 
CPPI exhibits more desirable properties than constant mix strategies. Balder et al. [4] 
analyze risks associated with implementation of CPPI under discrete-time trading 
and transaction costs. Zagst and Kraus [64] compare OBPI and CPPI with respect 
to stochastic dominance. Taking into account that implied volatility — which is rel- 
evant for OBPI — is usually higher than realized volatilty — relevant for CPPI — they 
find that under specific parametrizations CPPI dominates. Recently, Dockner [20] 
compares buy-and-hold, OBPI and CPPI concluding that there does not exist a clear 
ranking of the alternatives. Dichtl and Drobetz [19] consider prospect theory (Kah- 
neman and Tversky [41]) as framework to evaluate portfolio insurance strategies. 
They use a twofold methodological approach: Monte Carlo simulation and historical 
simulation with data for the German stock market. Within the behavioral finance 
context chosen, their findings provide clear support for the justification of downside 
protection strategies. Interestingly, in their study stop-loss, OBPI and CPPI turn out 
attractive while the high protection level of TIPP associated with opportunity costs 
in terms of reduced upside potential turns out to be suboptimal. Finally, they recom- 
mend to implement CPPI aggressively by using the highest multiplier m consistent 
with tolerance for overnight or gap risk. 
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Example 2 In 4 out of the 18 calendar years from 1995 to 2013, the S&P 500 total 
return index lost more than 5 %. For investors with limited risk capacity it was not 
helpful that these losses happened three times in a row (2000, 2001, and 2002), 
or were severe (2008). The following example illustrates how simple versions of 
common techniques to control downside risk have performed over these 18 years. 
We assume investment opportunities in the S&P 500 index and a risk-free asset, an 
investment horizon equal to the calendar year, and a frictionless market (no trans- 
action costs). Each calendar year the investment starts with a January 1st portfolio 
value of 100. Rebalancing is possible with daily frequency. For the portfolio insur- 
ance strategies investigated, the desired minimum wealth is given with 95, and free 
parameters are set in a way to make the strategies comparable, by ensuring equal 
equity allocations at portfolio start. This is achieved by resetting the multiples m 
for CPPI and TIPP each January 1st according to the Delta of the OBPI strategy. 
Similarly, the VaR confidence level is set to achieve this same equity proportion at 
the start of the calendar year. OBPI Delta also governs the initial equity portion of 
the buy-and-hold portfolio. Table 1 reports the main results, and Fig. 2 summarizes 
the distribution of year-end portfolio values in a box plot. 

The achieved minimum wealth levels show that for CPPI, TIPP, OBPI, and VaR- 
based portfolio insurance even in the worst year the desired minimum wealth has been 
missed just slightly, while in the case of the stop loss strategy there is a considerable 
gap. This can be partly explained by the simple setup of the eample (e.g., rebalancing 
using daily closing prices only, while in practice intraday decision-making and trad- 
ing will happen). But a possibly large gap between desired and achieved minimum 
wealth is also systematic of stop loss strategies because of the mechanics of stop-loss 
orders. The moment the stop limit is reached, a market order to sell the entire port- 
folio is executed. The trading price, therefore, can and frequently will be lower than 
the limit. This can pose considerable problems in highly volatile and illiquid market 
environments. Option replication comes next in missing desired wealth protection. 


Table 1 Portfolio insurance strategies 



Mean 

Median 

SD all 

SD lower 

Min 

Max 

Turnover 

Long only 

110.92 

113.91 

19.29 

15.55 

63.91 

137.59 

0.00 

Buy & hold 

108.32 

107.15 

12.52 

9.18 

79.99 

132.11 

0.00 

Stop loss 

108.77 

105.77 

16.09 

6.52 

89.13 

137.59 

0.42 

CPPI 

107.71 

104.77 

12.52 

2.82 

94.62 

136.89 

4.58 

TIPP 

105.31 

104.20 

7.45 

3.29 

94.63 

122.75 

1.03 

OBPI 

108.50 

105.21 

12.43 

4.29 

95.00 

135.58 

3.63 

Option repl. 

108.84 

107.07 

11.93 

4.72 

92.04 

132.59 

3.64 

VaR 

108.21 

104.15 

13.21 

2.64 

94.79 

137.59 

8.16 


Comparison of portfolio insurance strategies, annual horizon, S&P 500, 1995-2013. We report 
end-of-year wealth levels per investment of 100 (mean, median, min, max); standard deviations 
calculated both over the whole sample (SD all) and for the subsample where the annual S&P 500 
total return is below its mean (SD lower); turnover in the annual turnover ratio 
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Portfolio Insurance Strategies 
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Fig. 2 Comparison of portfolio insurance strategies, annual horizon, S&P 500, 1995-2013. For 
each strategy, the shaded area indicates the observations from the 25th to the 75th percentile, 
the median is shown as the line across the box and the mean as a diamond within the box. The 
whiskers denote the lowest datum still within 1.5 interquartile range of the lower quartile, and the 
highest datum still within 1.5 interquartile range of the upper quartile. If there are more extreme 
observations they are shown separately by a circle. The semitransparent horizontal line indicates 
the desired minimum wealth level 


In the example, this might be due the simplified setup, where the exercise price of 
the option to be replicated is determined only once per year (at year start), and then 
daily Delta is calculated for this option and used for allocation into the risky and the 
riskless asset. In practice, new information on volatility and the level of interest rates 
will also lead to a reset of the strike used for calculation of the Delta. Another obser- 
vation is that the standard deviation of annual returns is lowest for TIPP, which comes 
at the price of the lowest average return. If the cross-sectional standard deviation is 
computed only for the years with below-average S&P 500 returns, it is lowest for 
VaR-based risk control. For all methods shown, practical implementation will typi- 
cally use higher levels of sophistication. For example, trading filters will be applied 
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to avoid adjusting portfolios as frequently as in the example leading to high turnover 
values. 


3.4 Other Risks 

In the previous discussion, shortfall risk was seen from the perspective of an investor 
holding assets only. However, many institutional investors simultaneously optimize 
a portfolio of assets A and liabilities L. Sharpe and Tint [62] describe a flexible 
approach to systematically incorporate liabilities into pension fund asset allocation, 
by optimizing over a surplus measure S = A — kL, where k e [0, 1] is a factor 
denoting the relative weight attached to liabilities. In the context of asset liability 
management, Ang et al. [2] analyze the effect of downside risk aversion, and offer 
an explanation why risk aversion tends to be high when the value of the assets 
approaches the value of the liabilities. Ang et al. [2] specify the objective function 
of the fund as mean- variance over asset returns plus a downside risk penalty on the 
liability shortfall that is proportional to the value of an option to exchange the optimal 
portfolio for the random value of the liabilities. An investor following their advice 
tends to be more risk averse than a portfolio manager implementing the Sharpe and 
Tint [62] model. For very high funding ratios, the impact of downside risk on risk 
taking, and therefore the asset allocation of the pension fund manager is small. For 
deeply underfunded plans, the value of the option is also relatively insensitive to 
changes in volatility, again leading to a small impact on asset allocation. The effect 
of liabilities on asset allocation is strongest when the portfolio value is close to the 
value of liabilities. In this case, lower volatility reduces the value of the exchange 
option, leading to a smaller penalty. 

Another hedging motive arises if investors wish to bear only specific risks. This 
might be due to specialization of the investor in a certain asset class, making it 
desirable to hedge against risks not primarily driving the returns of this asset class. 
A popular example is currency risk, which has been recently analyzed by Campbell 
et al. [15] who find full currency hedging to be optimal for a variance-minimizing 
bond investor, but discuss the potential for overall risk reduction from keeping foreign 
exchange exposure partly unhedged in the case of equity portfolios. 


4 Parameter Uncertainty and Model Uncertainty 

Quantitative portfolio management builds on optimization output of stylized models, 
which (i) need to be carefully chosen to capture relevant features of the market 
framework and (ii) must be calibrated and parameterized. These choices, model 
selection, as well as model calibration, bear the risk of misspecification, which might 
have severely negative consequences on the desired out-of-sample properties of the 
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portfolio. Thus, a main application of risk management in asset management is 
controlling the risk inherent in model specification and parameter selection. In this 
section, we distinguish between parameter uncertainty and model uncertainty in the 
following way. With parameter uncertainty we refer to the case where we know the 
structure of the data generating process that lies behind the observed set of data, but 
the parameters of the process must be empirically determined. 4 Finite data history 
is the only limiting factor, which prevents us from deriving the true values of the 
model parameters. Under the assumption of the null hypothesis, we can derive the 
joint distribution of the estimated parameters relative to the true values, and finally 
the joint predictive distribution of asset returns under full consideration of estimation 
problems. Thus, we can treat parameter uncertainty simply as an additional source of 
variability in returns. It is noncontroversial to assume that a decision-maker does not 
distinguish between uncertainty in returns caused by the general variability of returns 
and uncertainty that has its origin in estimation problems, and hence the portfolio 
optimization paradigm is not affected. 

In contrast, with model uncertainty we refer to the case where a decision-maker 
is not sure, which model is the correct formulation that describes the underlying 
dynamics of asset returns. In such a case, it is generally not possible to specify prob- 
abilities for the models considered as feasible. Thus, model uncertainty increases 
uncertainty about asset returns, but we are not able to state a definite probability 
distribution of returns, which incorporates model uncertainty. That is, model uncer- 
tainty is a prototypical case of Knightian uncertainty, referring to Knight [42], where 
it is not possible to characterize the uncertain entity (in our case the asset return) by 
means of a probability distribution. Consequently, model uncertainty fundamentally 
changes the decision-making framework and we have to make assumptions regarding 
a decision-maker’s preferences concerning situations of ambiguity. 


4.1 Parameter Uncertainty 


The most obvious estimation problem in a traditional minimum- variance portfolio 
optimization task arises when determining the covariance structure of asset returns. 
This is so because estimates of the sample covariance matrix turn out to be weakly 
conditioned in general and — as soon as the number of assets is larger than the number 
of periods considered in the return history — the sample covariance matrix is singular 
by construction. 

Example 3 Consider as a broad asset universe the S&P 500 with N = 500 con- 
stituents. It is common practice to estimate the covariance structure of stock returns 
from two years of weekly returns. The argument for a restriction of the history to 
T = 104 weeks is a reaction to the fact that there is apparently some time- variation 


4 We assume in general, that the model has a structure, which ensures that parameters are identifiable. 
For example, it is assumed that log-returns are normally distributed, but mean and variance must 
be estimated from observed data. 
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in the covariance structure, which the estimate is able to capture only if one restricts 
the used history. 5 

Let r denote the (T x N) matrix containing weekly returns, then the sample 
covariance matrix Ey is determined by 


£ S = y'- i r'Mr, (7) 

where the symmetric and idempotent matrix M is the residual maker with respect to 
a regression onto a constant, 


M = 1-1(1' 1) _1 1', 

with I the (T x T) identity matrix and 1 a column vector containing T times the 
constant 1. 

In the assumed setup, the sample covariance matrix is singular by construction. 
This is so because from (7) it follows that the rank of E s is bounded from above by 
minl^/V, T — l}. 6 And even in the case where the number of return observations per 
asset exceeds the number of assets (T > N + 1) the sample covariance matrix is 
weakly determined, hence, subject to large estimation errors since one has to estimate 
N(N + l)/2 elements of Ey from T • N observations. 

Since a simple Markowitz optimization, see Markowitz [49], needs to invert the 
covariance matrix, matrix singularity prohibits any attempt of advanced portfolio 
optimization, and is thus the most evident estimation problem in portfolio manage- 
ment. Elton and Gruber [22] is an early contribution, which proposes the use of 
structural estimators of the covariance matrix. Jobson and Korkie [38] provide a rig- 
orous analysis of the small sample properties of estimates of the covariance structure 
of returns. 

Less evident are the problems caused by errors in the estimates of return expec- 
tations, whereas it turns out that they are economically much more critical. Jorion 
[39] shows in the context of international equity portfolio selection that the errors 
in the estimates of return expectations have a severe impact on the out-of-sample 


5 Such an approach is typical for dealing with inadequate model specification. The formal estimate 
is based on the assumption that the covariance structure is stable. Since data show that the covariance 
structure is not stable, an ad-hoc adaptation — the limitation of the data history — is used to capture 
the recent covariance structure. The optimal amount of historical data that should be used cannot 
be derived within the model, but must be roughly calibrated to some measure of goodness-of-fit, 
which balances estimation error against timely response to time variations. 

6 The residual maker M has at most rank T — 1 because it generates residuals from a projection 
onto a one-dimensional subspace of R T . Since r has at most rank N, we have 

rank(Ss) < min{Af, T — 1}. 

For example, the sample covariance matrix estimated from two years of weekly returns of the 500 
constituents of the S&P500 (104 observations per stock) has at most rank 103. Hence, it is not 
positive definite and not invertible, because at least 397 of its 500 eigenvalues are exactly equal 0. 
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performance of optimized portfolios. He further shows that the Bayes-Stein shrink- 
age approach introduced in Jorion [40] helps mitigate errors and at the same time 
improves out-of-sample properties of the portfolio. 

Structural Estimators Means and covariances of asset returns are the most basic 
inputs into a portfolio optimization model. However, estimation errors in further 
model parameters like some measure of risk aversion, speed of reversion to long- 
term averages, etc., must be estimated from empirical data and are, thus, equally 
likely inflicted with estimation errors. While sample estimates of distribution means, 
(co-)variances and higher moments are generally unbiased and efficient, they tend to 
be noisy. This can be improved by imposing some sort of structure on the estimated 
parameters. Such structural estimates are less prone to estimation errors at the expense 
of ignoring part of the information inherent in the observed data sample. When 
determining the covariance structure of asset returns, Elton and Gruber [22] analyze 
a set of different structural assumptions, e.g., what they call the single index model 
(assuming that the pairwise covariance of asset returns is only generated by the assets 
individual correlation to a market index), the mean model (pairwise correlations 
between assets are assumed constant across the asset universe), and models that 
assume that the correlation structure of asset returns is determined by within industry 
averages or across industry averages or by a (small) number of principal components 
of the sample covariance matrix. They show that especially the particularly restrictive 
estimates (single index model and mean model) deliver forecasts of future correlation 
that are more accurate than the simple historical sample estimates. 7 

Shrinkage Estimators When determining model parameters 0 , it is very popular 
to apply some shrinkage approach. This approach aims to combine the advantages of 
a sample estimate Os (pure reliance on sample data) and a structural estimate Struct 
(robustness) by computing some sort of weighted average 8 


0 _ A §s + (1 — A) Struct- 


While practitioners often use ad hoc weighting schemes, the literature provides 
a powerful Bayesian interpretation of shrinkage, which allows for the computation 
of optimal weights. In this Bayesian view, the structural estimator serves as the 
prior, which anchors the location of model parameters 0 and the sample estimate 
acts as the conditioning signal. Bayes’ rule then gives a stringent advice of how to 
combine prior and signal in order to compute the updated posterior that is used as 
an input for the portfolio optimization. The abovementioned Bayes-Stein shrinkage 
used in Jorion [39, 40] focuses on estimates of the expected returns. In the context 
of covariance estimation, an early contribution is Frost and Savarino [25]. More 
recently, Ledoit and Wolf [43] determine a more general Bayesian framework to 
optimize the shrinkage intensity, in which the authors explicitly correct for the fact 


7 See, e.g., Dangl and Kashofer [18] for an overview of structural estimates of the covariance 
structure of large equity portfolios — including shrinkage estimates. 

8 Shrinkage is usually a multivariate concept, i.e., A is in general not a fixed scalar, but it depends 
on the observed data in some nonlinear fashion. 
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that the prior (i.e., the structural estimate of the covariance structure) as well as the 
updating information (i.e., the sample covariance matrix) are determined from the 
same data. Consequently, errors in these two inputs are not independent and the 
Bayesian estimate must control for the interdependence. 9 

Weight Restrictions A commonly observed reaction to parameter uncertainty in 
portfolio management is imposing ad hoc restrictions on portfolio weights. That is, 
the discretion of a portfolio optimizer is limited by maximum as well as minimum 
constraints on the weights of portfolio constituents. 10 In sample, weight restrictions 
clearly reduce portfolio performance (as measured by the objective function used in 
the optimization approach). * 1 1 Nevertheless, out of sample studies show, that in many 
cases weight restrictions improve the risk-return trade-off of portfolios. Jagannathan 
and Ma [37] provide evidence why weight restrictions might be an efficient response 
to estimation errors in the covariance structure. Analyzing minimum- variance portfo- 
lios they show that binding long only constraints are equivalent to shrinking extreme 
covariance estimates toward more moderate levels. 

Robust Optimization A more systematic approach to parameter uncertainty than 
weight restrictions is robust optimization. After determining the uncertainty set S for 
the relevant parameter vector p , robust portfolio optimization is usually formulated 
as a max-min problem where the vector w of portfolio weights solves the equation 

w e argmax^ {min f(w; p )}, 
peS 

with f(w; p) being the planner’s objective function that she seeks to maximize. 
This is a conservative or worst-case approach, which in many real-world applica- 
tions shows favorable out-of-sample properties (see Fabozzi et al. [24], or for more 
details on robust and convex optimization problems and its applications in finance 
see Lobo et al. [48]). Provided a distribution of the parameters is available, the rather 
extreme max-min approach could be relaxed by applying convex risk measures. In 
the context of derivatives pricing, Bannoer and Scherer [5] develop the concept of 
risk-capturing functionals and exemplify risk averse pricing using an average Value- 
at-Risk measure. 

Resampling A different approach to deal with parameter uncertainty in asset man- 
agement is resampling. This technique does not attempt to produce more robust para- 
meter estimates or to build a portfolio-optimization model, which directly regards 
parameter uncertainty in portfolio optimization. Resampling is a simulation-based 
approach that was first described in the portfolio-optimization context by Michaud 
[52] and exists in different specifications. It takes the sample estimates of mean 


9 See also Ledoit and Wolf [44, 45] for more on shrinkage estimates of the covariance structure. 

10 Weight restrictions are frequently part of regulatory measures targeting the fund industry aimed 
to control the risk characteristics of investment funds. 

11 Green and Hollifield [28] argue that in the apparent presence of a strong factor structure in the 
cross section of equity returns, mean- variance optimal portfolios should take large short positions 
in selected assets. Hence, a restriction to a long-only portfolio is expected to negatively influence 
portfolio performance. 
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returns as well as of the covariance matrix and resamples a number R of return 
‘histories’ (where R is typically between 1,000 and 10,000). From each of these 
return histories, an estimate of the vector of mean returns as well as of the covari- 
ance matrix is derived. These estimates form the ingredients to calculate R different 
versions of the mean-variance frontier. Resampling approaches differ in the set of 
restrictions used to determine the mean- variance frontiers and in the way how the 
frontiers are averaged to get the definite portfolio weights. Some authors criticize 
that the unconditionally optimal portfolio does not simply follow from an average 
over R vectors of conditionally optimal portfolio weights (see, e.g., Scherer [59] or 
Markowitz and Usmen [50]), others point out that the ad-hoc approach of resampling 
could be improved by using a Bayesian approach (see, e.g., Scherer [60], or Harvey 
et al. [33, 34]). Despite the critique, all those studies appreciate the out-of-sample 
characteristics of resampled portfolios. 

Example 4 This simple example builds on Example 1 which discusses the optimal 
weight of an active fund relative to a passive factor investment. An index-investment 
in the S&P 500 serves as the passive factor investment and an active fund with 
the constituents of the S&P 500 as its investment universe is the delegated active 
investment strategy. In Example 1 we take a history of five years of monthly log- 
returns (60 observations) to estimate mean returns as well as the covariance structure 
and the alpha, which the fund generates relative to the passive investment. We use 
these estimates to conclude that the optimal portfolio weight of the fund should be 
roughly 90 % and only 10 % of wealth should be held as a passive investment. 

Being concerned about the quality of our parameter estimation that feeds into the 
optimization, we first examine the regression, which was performed to come up with 
these estimates. Assuming that log-returns are normally distributed, we conclude 
from the regression in Example 1 that our best estimates of the parameters a , /? and 
v are 

a = 17.51 bp/month, (3 = 0.9821, v = 131.27 bp/month, 
and that the estimation errors are t-distributed with a standard deviation 12 

cr 58 (cfc) = 23.40 bp/month, <758(/3) = 0.0498, 0 - 59 ( 1 )) = 454.91 bp/month. 

Furthermore, estimation errors in a and (3 are negatively correlated with a correlation 
coefficient p = —27.93 % and errors in the estimate of the market risk premium 0 
are uncorrelated to the errors in a and (3. 

A statistician would now conclude that neither the fund’s a nor the factor’s risk 
premium v is significantly different from zero, and thus an investor should seek 
exposure to none of the two. Another approach is to extend the optimization problem 
and include parameter uncertainty as an additional source of variability in the final 
outcome. 


12 


Subscripts denote degrees of freedom. 
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optimal fund weight 

Fig. 3 Distribution of optimal portfolio weight in the interval [ — 100 %, 200 %] of the active invest- 
ment over 100,000 resampled histories. Approximately 29 % of weights lie outside the stated interval 


In contrast to a full consideration of parameter uncertainty, we use a resampling 
approach, which addresses this issue in a more ad hoc manner. We take the empir- 
ical estimates as the true moments of the joint distribution of factor returns and 
active returns, and resample 100,000 histories. 13 Then, we perform the optimization 
discussed in Example 1 on each of the simulated histories. Figure 3 illustrates the 
distribution of optimal active weights across theses 100,000 histories. Given the null 
hypothesis that returns are normally distributed with the estimated moments, resam- 
pling gives a good and reliable overview of the joint distribution of model parameters 
we estimate and — finally — an overview of the distribution of optimal weights. We 
can conclude that in the present setup, optimal active weights are not well determined 
since the estimation of the optimization model from only 60 observations per time 
series is too noisy to get a well-determined outcome. While resampling generates a 
good picture of the overall effects of parameter uncertainty, it provides no natural 
advice for the optimal portfolio decision beyond this illustrative insight. 14 


13 This is the simplest version of resampling, mostly used in portfolio optimization. Given the null 
hypothesis that returns are normally distributed, we know that the empirical estimates of distribution 
moments are t-distributed around the true parameters, see Jobson and Korkie [38] for a detailed 
derivation of the small sample properties of these estimates.Thus, a more advanced approach samples 
for each of the histories, first the model parameters from their joint distribution, and then — given 
the selected moments — the history of normally distributed returns. Harvey et al. [33] is an example 
that uses advanced resampling to compare Bayesian inference with simple resampling. 

14 Some authors do propose schemes how to generate portfolio decisions from the cross section of 
the simulation results, see, e.g., Michaud and Michaud [53]. These schemes are, however, criticized 
by other authors for not being well-founded in decision theory, e.g., Markowitz and Usmen [50] 
and others mentioned in the text above. 
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Finally, a study that perfectly illustrates the strong implications of parameter 
uncertainty on optimal portfolio decisions is Pastor and Stambaugh [54]. The authors 
question the paradigm that due to mean reverting returns, stocks are less risky in the 
long run than over short horizons. This proposition is true if we know the parameters 
of the underlying mean reverting process with certainty. Pastor and Stambaugh [54] 
show that as soon as we properly regard estimation errors in model parameters, 
additional uncertainty from estimation errors dominates the variance reduction due 
to mean reversion, and thus they provide strong evidence against time diversification 
in equity returns. 


4.2 Model Uncertainty 

Qualitatively different from dealing with parameter uncertainty is the issue of model 
uncertainty. Since it is not at all clear what the exact characteristics of the data- 
generating process, which underlies asset returns are, it is not obvious which attributes 
a model must feature in order to capture all economically relevant effects of the 
portfolio selection process. Hence, every model of optimal portfolio choice bears the 
risk of being misspecified. In Sect. 4.1 we already mention the fact that traditional 
portfolio models assume that mean returns and the covariance structure of returns 
are constant over time. This is in contrast to empirical evidence that the moments 
of the return distribution are time varying. Limiting the history, which is used to 
estimate distribution parameters, is a frequently used procedure to get a more actual 
estimate. The correct length of historical data that shall be used is, however, only 
rarely determined in a systematic manner. 

Bayesian Model Averaging A systematic approach to estimation under model 
uncertainty is Bayesian model averaging. It builds on the concept of a Bayesian 
decision-maker that has a prior about the probability weights of competing models 
that are constructed to predict relevant variables (e.g., asset returns) one period ahead. 
Observed returns are then used to determine posterior probability weights for each 
of the models considered applying Bayes rule. 15 Each of the competing models 
generates a predictive density for the next period’s return. After observing the return, 
models which have assigned a high likelihood to the observed value (compared to 
others) experience an upward revision of their probability weight. In contrast, models 
that have assigned a low likelihood to the observed value experience a downward 
revision of their weight. Finally, the overall predictive density is calculated as a 
probability- weighted sum of all models’ predictive densities. This Bayesian model 
averaging is an elegant way to approach a problem of model uncertainty to transform 
it into a standard portfolio problem to find the optimal risk-return trade-off under the 
derived predictive return distribution. This approach can, however, only be applied 


15 The posterior probability that a certain model is the correct model is proportional to the product 
of the model’s prior probability weight and the realized likelihood of the observed return. 
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under the assumption that the decision-maker has a single prior and that she shows 
no aversion against the ambiguity inherent in the model uncertainty. 16 

Raftery et al. [57] provide the technical details of Bayesian model averaging and 
Avramov [3], Cremers [16], and Dangl and Hailing [17] are applications to return 
prediction. Bayesian model averaging treats model uncertainty just as an additional 
source of variation. The predictive density for next period’s returns becomes more 
disperse the higher the uncertainty about models, which differ in their prediction. The 
optimal portfolio selection is then unchanged, but regards the additional contribution 
to uncertainty. 

Ambiguity Aversion If it is not possible to explicitly assess the probability that a cer- 
tain model correctly mirrors the portfolio selection problem and investors are averse 
to this form of ambiguity, alternative portfolio selection approaches are needed. 
Garlappi et al. [26] develop a portfolio selection approach for investors who have 
multiple priors over return expectations and show ambiguity aversion. The authors 
prove that the portfolio selection problem of such an ambiguity-averse investor can 
be formulated by imposing two modifications to the standard mean- variance model, 
(i) an additional constraint that guarantees that the expected return lies in a specified 
confidence region (the way how multiple priors are modeled) and (ii) an additional 
minimization over all expected returns that conform to the priors (mirroring ambi- 
guity aversion). This model gives an intuitive illustration of the fact that ambiguity 
averse investors show explicit desire for robustness. 


5 Conclusion 


The asset management industry has substantial influence on financial markets and 
on the welfare of many citizens. Increasingly, citizens are saving for retirement via 
delegated portfolio managers such as pension funds or mutual funds. In many cases 
there are multiple layers of delegation. It is, therefore, crucial for the welfare of 
modern societies that portfolio managers manage and control their portfolio risks. 
This article provides an eagle’s perspective on risk management in asset management. 

In traditional portfolio theory, the scope for risk control in portfolio management 
is limited. Risk management is essentially equivalent to determining the fraction 
of capital that the manager invests in a broadly and well diversified basket of risky 
securities. Thus, the “risk manager” only needs to find the optimal location on the 
securities market line. By contrast, in a more realistic model of the world that accounts 
for frictions, risk management becomes a central and important module in asset 
management that is frequently separate from other divisions of an asset manager. 
We identify several major frictions that require risk management that goes beyond 
choosing the weight of the riskless asset in the portfolio. First, in a world with costly 
information acquisition, investors do not hold the same mix of risky assets. This 


16 As explained in the introduction to this section, ambiguity aversion refers to preferences that 
express discomfort with uncertainty in the sense of Knight [42]. 


Risk Control in Asset Management: Motives and Concepts 


263 


requires measuring a position’s risk contribution relative to the specific portfolio. 
Thus, risk management requires constant measurement of each portfolio position’s 
marginal risk contribution and comparing it to its marginal return contribution. This 
article derives a framework to calculate the marginal risk contributions and to decide 
on optimal portfolio weights of active managers. 

In many realistic instances, investors have nonstandard preferences, which make 
them particularly sensitive to downside risks. We, therefore, review the main portfolio 
insurance concepts to achieve protection against downside risk. Stop-loss strategies, 
option-based portfolio insurance, constant proportion portfolio insurance, ratcheting 
strategies, and value-at-risk-based portfolio insurance. Using data for the S&P 500 
since 1995 we simulate these alternative risk management concepts and demonstrate 
their risk and return characteristics. 

Finally, we point out that quantitative portfolio management usually builds on the 
output from rather stylized models, which must be chosen to capture the relevant 
market environment, and which must be calibrated and parameterized. Both these 
choices, i.e., model selection and model calibration, contain the risk of misspecifica- 
tion, and thus the risk of negative effects on out-of-sample portfolio performance. We 
survey and discuss risk management approaches to deal with parameter uncertainty, 
such as shrinkage procedures or resampling procedures. Qualitatively different from 
parameter uncertainty is the effect of model uncertainty. Different ways of dealing 
with model uncertainty via methods of Bayesian model averaging and the consider- 
ation of ambiguity aversion are, therefore, surveyed and discussed. 

The increased risk during the financial crisis and the following sovereign debt cri- 
sis has lead to a substantially increased focus on risk control in the asset management 
industry. At the same time these market episodes have also demonstrated the limi- 
tations of risk management in asset management. For example that volatile markets 
without strong trends make existing downside protection strategies very expensive 
for investors. Furthermore, risk management concepts for long-term investors are 
still in their infancy. Scenario-based approaches, possibly combined with min-max 
strategies may be more useful in this context than standard risk management tools. 

Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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Worst-Case Scenario Portfolio Optimization 
Given the Probability of a Crash 


Olaf Menkens 


Abstract Korn and Wilmott [9] introduced the worst-case scenario portfolio prob- 
lem. Although Korn and Wilmott assume that the probability of a crash occurring 
is unknown, this paper analyzes how the worst-case scenario portfolio problem is 
affected if the probability of a crash occurring is known. The result is that the addi- 
tional information of the known probability is not used in the worst-case scenario. 
This leads to a g -quantile approach (instead of a worst case), which is a value at 
risk- style approach in the optimal portfolio problem with respect to the potential 
crash. Finally, it will be shown that — under suitable conditions — every stochastic 
portfolio strategy has at least one superior deterministic portfolio strategy within this 
approach. 


1 Introduction 


Portfolio optimization in continuous time goes back to Merton [17]. Merton assumes 
that the investor has two investment opportunities; one risk-free asset (bond) and one 
risky asset (stock) with dynamics given by 

dP 0f o(0 = flD.o (0 ro d t, Po,o(0) = 1, “bond” 

dPo,i(0 = Po,i(t)Uiodt + a 0 d W 0 (Oh Ou(0) = Pi, “stock” 

with constant market coefficients /xo, ro, cro > 0, and where Wo is a Brown- 
ian motion on a complete probability space (£?, & , P). Finally, Xq denotes the 
wealth process of the investor given the portfolio strategy tt (which denotes the 
fraction invested in the risky asset). More specifically, the wealth process satisfies 

dXj (0 = *0 (0 l(ro + 7T(0 [tio - r 0 ]) d t + ir(t)cr 0 dW 0 (t)] , 
Xq(0)=x. 
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Assuming that the utility function U(x) of the investor is given by U (x) = ln(v), 
one can define the performance function for an arbitrary admissible portfolio strategy 
n(t) by 


l /o(t,x,n):=E [In (X^’ x (T))] = In (jc) + E 


■ T |- 
/ 


*0 -f (*(*)- *5) 2 


ds 


. (i) 


Here, 
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will be called the utility growth potential or earning potential and the optimal port- 
folio strategy or Merton fraction, respectively. Using this, the portfolio optimization 
problem in the Merton case (that is without taking possible jumps into account) is 
given by 


sup /o (t, x, tv ) =: v 0 (t, x) [= ln(x) + T - t)] , (2) 

7r(-)eA 0 (jc) 


where vo is known as the value function in the Merton case. From Eq. (1), it is clear 
that 7 Tq maximizes ^o- Hence, it is the optimal portfolio strategy for Eq. (2). 

Merton’s model has the disadvantage that it cannot model jumps in the price of 
the risky asset. Therefore, Aase [1] extended Merton’s model to allow for jumps in 
the risky asset. In the simplest case, the dynamics of the risky asset changes to 


d Pj(t) = Pj(t) Ui 0 dt + cr 0 dWo(t)-kdN(t)], 


where A is a Poisson process with intensity X > 0 on (£?, P) and k > 0 is the 
crash or jump size. In this setting, the performance function is given by 


J*j (t, x, i r) = In (x) + E 


r t 
/ 

2 

^0 y (n(s) — ^o) 2 — l n (1 — n(s)k) A 

ds 





Using this, the optimal portfolio strategy can be computed to 



Figure 1 shows the fraction invested in the risky asset in Merton’s (solid line) and 
Aase’s model for various A (all the other lines). The dashed line below the solid 
line is the case where j- =50, that is the investor expects on average one crash 
within 50 years. By comparison, the lowest line (the dash-dotted line) is the case 
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Fig. 1 Examples of Merton’s optimal portfolio strategies. This figure is plotted with = 1.25, 
cr 0 = 0.25, r = 0.05, k = 0.25, and T = 50. This implies that A 0 = ^ = 0.3125, <P 0 ~ 
0.098828, and ± = 4 

where j- = 2.5, that is the investor expects on average one crash within 2.5 years. 
Note, however, that the fraction invested in the risky asset is negative in this case, 
meaning that the optimal strategy is that the investor goes short in the risky asset. 
This strategy is very risky because the probability that the investor will go bankrupt 
is strictly positive. This can also be observed in practice where several hedge funds 
went bankrupt which were betting on a crash in the way described above. 
Therefore, let us consider an ansatz which overcomes this problem. 


1.1 Alternative Ansatz of Korn and Wilmott 


The ansatz made by Korn and Wilmott [9] is to distinguish between normal times 
and crash time. In normal times, the same set up as in Merton’s model is used. At 
the crash time, the price of the risky asset falls by a factor of A; £ [ k *, &*] (with 
0 <£*<£* < 1). This implies that the wealth process Xfi ( t ) just before the crash 
time r- satisfies 


XJ (r-) = [l-7r(r)]Xg (t— ) + tt(t)Xq (r-) . 


bond investment 


stock investment 
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At the crash time, the price of the risky asset drops by a factor of k , implying 

[1 - jr (r )] Xj (r— ) + 7 r(r)X£ (r-) [1 — *] = [1 — n{x)k\ Xj (r— ) = Xj (r) . 

Therefore, one has a straightforward relationship of the wealth right before a crash 
with the wealth right after a crash. 

The main disadvantage of this ansatz is that one needs to know the maximal 
possible number of crashes M that can happen at most — in the following, we assume 
for simplicity that M — 1 if not stated otherwise — and one needs to know the worst 
crash size k* that can happen. On the other hand, no probabilistic assumptions are 
made on the crash time or crash size. Therefore, Merton’s approach, to maximize the 
expected utility of terminal wealth, cannot be used in this context. Instead the aim is 
to find the best uniform worst-case bound , e.g. solve 

sup inf E[ln(X 7r (T))], (3) 

where the terminal wealth satisfies X 71 ( T ) = (1 — i r(r)k) Xj ( T ) in the case of a 
crash of size k at stopping time r. Moreover, K = {0} U [ k *, k*]. This will be called 
the worst-case scenario portfolio problem. 

Note that one requires that n(t) < p for all t e [0, T] in order to avoid bank- 
ruptcy. The value function to the above problem is defined via 

v c (t , x ) := sup inf E [in (X n ^ x (T))] . (4) 

Observe that this optimization problem can be interpreted as a stochastic differen- 
tial game (see Korn and Steffensen [12]), where the investor tries to maximize her 
expected utility of terminal wealth while the counterparty (the market or nature) tries 
to hit the investor as badly as possible by triggering a crash. The control of the investor 
is 7 r , the fraction of wealth invested into the risky asset, while the control of the coun- 
terparty is the crash time r and the crash size k. Figure 2 depicts the optimization 
problem. For each control choice (that is portfolio strategy) of the investor (e.g., tt 2 ), 
the investor calculates the expected utility of terminal wealth for all possible control 
choices (that is (r, k)) of the counterparty (which is the dotted line for the strategy 
712 ). Then the worst-case scenario is determined for each portfolio strategy (e.g., 
( [x ^ , k^ 2 ^) for the strategy 712 ). Afterwards, the expected utility of terminal wealth 
for this worst-case scenario is calculated and denoted by (nf)- The last step is to 
find the strategy which maximizes the worst-case scenario function (•). For the 
three examples given in Fig. 2, this would be 7T3. Notice that 7T3 is special in that all 
choices (t, k ) lead to the same worst-case scenario, that is Wtf (nf) is independent 
of the scenario (r, k). 

Observe that Aase [1] would fix k , model the crash time via a Poisson distribution, 
and maximize the expected utility of terminal wealth. Whereas, by comparison, the 
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E[ln (X'W'^r))] 



Fig. 2 Schematic interpretation of the worst-case scenario optimization 


worst-case scenario optimization method uses a probability-free approach on crash 
time and size. 

Apparently, it is quite cumbersome to determine the optimal portfolio strategy 
as described above. Instead consider the following approach. Define as v\ the value 
function as in Eq. (2), except that the subscript 1 indicates that this is the value function 
in the Merton case after a crash has happened (and where the market parameters might 
change — see Sect. 2 for details). To that end, a portfolio strategy fc > 0 determined 
via the equation 


/ Q (t,x,n) = v x (t,x(\-n{t)k*)) for all t e [0, T\ (5) 

will be called a crash indifference strategy. This is, because the investor gets the 
same expected utility of terminal wealth if either no crash happens (left-hand side) 
or a crash of the worst-case size k* happens (right-hand side). It is straightforward 
to verify (see Korn and Menkens [10]) that there exists a unique crash indifference 
strategy tt, which is given by the solution of the differential equation 

£'(0 = y 0 - 2 ) ( jt (0 _ jr *) 2 , ( 6 ) 

with rt(T) = 0. (7) 

This crash indifference strategy is bounded by 0 < Tt < min{7TQ , pr } . It can be shown 
(see Korn and Wilmott [9] or Korn and Menkens [10]) that the optimal portfolio 
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strategy for an investor, who wants to maximize her worst-case scenario portfolio 
problem, is given by 

7 z(t) := min [ft it), tTq } for all t e [ 0 , T]. ( 8 ) 

ft will be named the optimal crash hedging strategy or optimal worst-case scenario 
strategy. 

Figure 3 shows the optimal worst-case scenario strategies of Kom/Wilmott if at 
most one (solid line), two (dashed line), or three (dash-dotted line) crashes can 
happen. Assuming that the investor has an initial investment horizon of T = 50 and 
expects to see at most three crashes, a optimal worst-case scenario investor would use 
the portfolio strategy 713 ( t ) until she observes a first crash, say at time T \ . After having 
observed a crash, the investor would switch to the strategy ft2(t), since the investor 
expects to see at most two further crashes in the remaining investment horizon T — z \ ; 
and so on. Finally, if the investor expects to observe no further crash, she will switch 
to the Merton fraction 7 Tq . 

The worst case scenario strategies are now compared to the optimal portfolio 
strategy in Aase’s model, where X(t) = (see dotted line in Fig. 3 ), that is the 
investor expects to see on average one crash over his remaining investment horizon 
T — t. Clearly, setting X in this way is somewhat unrealistic. Nevertheless, this 
extreme example is used to point out several disadvantages of the expected utility 



Fig. 3 Examples of worst-case optimal portfolio strategies 
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approach of Aase. First, considering a A which changes over time and depends on 
the investment horizon of the investor, leads not only to a time-changing optimal 
strategy Ttp (t), but also to a price dynamics of the risky asset which depends on the 
investment horizon of the investor. Hence, any two investors with different investment 
horizons would work with different price dynamics of the risky assets. Second, as 
the investor approaches the investment horizon, X(t) o o (that is a crash happens 

almost surely), thus, Jtp(t) — > — oo, which would lead to big losses on short-term 
investment horizons if no crash happens. Of course, these losses would average out 
with the gains made if a crash happens remembering that the assumption is that — on 
average — every second scenario would observe at least one crash. This is the effect 
of averaging the crash out in an expected utility sense (compared to the worst-case 
approach of Kom/Wilmott). 

Basically, it would be possible to cut off n p (t) at zero, that is, one would not 
allow for short-selling. This would imply to cut off X(t) at ^°~ r ° and there is no 
economic interpretation why this should be done (except that short-selling might not 
be allowed). Finally, note that it is also possible to set A such that one expects to see 
at least one crash with probability q (e.g., q = 5 %), however this would not remedy 
the two disadvantages mentioned above. 

Why is the worst-case scenario approach more suitable than the standard expected 
utility approach in the presence of jumps ? The standard expected utility approach 
will average out the impact of the jumps over all possible scenarios. With other 
words, the corresponding optimal strategy will offer protection only on average over 
all possible scenarios, which will be good as long as either no jump or just a small 
jump happens. However, if a large jumps happens, the protection is negligible. By 
comparison, the worst-case scenario approach will offer full protection from a jump 
up to the worst-case jump size assumed. 

The situation can be compared to the case of buying liability insurance. The 
standard utility approach would look at the average of all possible claim sizes (say 
e.g., 100,000 EUR) — and its optimal strategy would be to buy liability insurance 
with a cover of 100,000 EUR. However, the usual advice is to buy liability insurance 
with a cover which is as high as possible — this solution corresponds to the worst- 
case scenario approach. With other words, the aim is to insure the rare large jumps. 
This observation is supported by the fact that many insurances offer retention (which 
excludes small jumps from the insurance). 


1.2 Literature Review 

To the best of our knowledge, Hua and Wilmott [8] were the first to consider the 
worst-case scenario approach in a binomial model to price derivatives. Korn and 
Wilmott [9] were the first to apply this approach to portfolio optimization, and Korn 
and Menkens [10] developed a stochastic control framework for this approach, while 
Korn and Steffensen [12] considered this approach as a stochastic differential game. 
Korn and Menkens [10] and Menkens [16] looked at changing market coefficients 
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after a crash. Seifried [22] evolved a martingale approach for the worst-case sce- 
nario. Moreover, the worst-case scenario approach has been applied to the optimal 
investment problem of an insurance company (see Korn [11]) and to optimize rein- 
surance for an insurance company (see Korn et al. [13]). Korn et al. [13] show also 
in their setting that the worst-case scenario approach has a negative diversification 
effect. Furthermore, both portfolio optimization under proportional transaction costs 
(see Belak et al. [4]) and the infinite time consumption problem (see Desmettre 
et al. [7]) have been studied in a worst-case scenario optimization setting. Monnig 
[18] applies the worst-case scenario approach in a stochastic target setting to com- 
pute option prices. Finally, Belak et al. [2, 3] allow for a random number of crashes, 
while Menkens [15] analyzes the costs and benefits of using the worst-case scenario 
approach. 

Notice that there is a different worst-case scenario optimization problem which 
is also known as Wald’s Maximin approach (see Wald [23, 24]). The following 
quotation is taken from Wald [23, p. 279]: 

A problem of statistical inference may be interpreted as a zero sum two person game as 
follows: Player 1 is Nature and player 2 is the statistician. [. . .] The outcome K[0, w(E)] of 
the game is the risk r [0 \w(E)] of the statistician. Clearly, the statistician wishes to minimize 
r [0 |w(F)]. Of course, we cannot say that Nature wants to maximize r [0 |w(F)]. However, if 
the statistician is in complete ignorance as to Nature’s choice, it is perhaps not unreasonable to 
base the theory of a proper choice of w(E) on the assumption that Nature wants to maximize 
r[0\w(E)]. 

This is a well-known concept in decision theory and is also known as robust opti- 
mization (see e.g., Bertsimas et al. [5] or Rustem and Howe [21] and the references 
therein). However, while the ansatz is the same, it is usually assumed that the parame- 
ters (in our case ro, /xo, and cro) are unknown within certain boundaries. Therefore, 
this is a parameter uncertainty problem which is solved using a worst-case scenario 
approach — instead of using perturbation analysis. Observe that this usually involves 
optimization procedures done by a computer. Finally, note that the optimal strate- 
gies can be computed directly only in the special case that only /x o is uncertain (see 
Mataramvura and 0ksendal [14], 0ksendal and Sulem [19], or Pelsser [20] for a 
recent application in an insurance setting). 

By comparison, the worst-case scenario approach considered in this paper is 
taking (possibly external) shocks/jumps/crashes into account — and not parameter 
uncertainty. While the original idea and the wording are similar or even the same, it 
is clear that the worst-case scenario approach of Korn and Wilmott [9] is different 
from the robust optimization approach in decision theory. 

The remainder of this paper is organized as follows. Section 2 introduces the set up 
of the model which will be considered; and Sect. 3 solves the optimization problem 
if the probability of a potential crash is known. As a consequence, the g -quantile 
crash hedging strategy will be developed in Sect. 4. Section 5 gives examples of 
the g -quantile crash hedging strategy, while Sect. 6 shows that stochastic portfolio 
strategies are always inferior to their corresponding deterministic portfolio strategies. 
Finally, Sect. 7 concludes. 
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2 Setup of the Model 


Let us work with the model introduced above and let us make the following refine- 
ments. First, it has been tacitly assumed that the investor is able to realize that the 
crash has happened. Thus, let us model its occurrence via a & — stopping time t. To 
model the fact that the investor is able to realize that a jump of the stock price has hap- 
pened it is supposed that the investor’s decisions are adapted to the P -augmentation 
{J£>} of the filtration generated by the Brownian motion W (; t ). The difficulty of this 
approach is to determine the optimal strategy after a crash because the starting point 
is random, however Seifried [22] solved this problem. 

Let us further suppose that the market conditions change after a possible crash. 
Let therefore k (with k e [&* , &*]) be the arbitrary size of a crash at time t . The price 
of the bond and the risky asset after a crash of size k happened at time x is assumed 
to be 


dP\,o(t) = Pi, o(0 n d t , Fpo(r) = Po,o00 , (9) 

dPi,i(f) = Pi, i(t) [/xi d t + a i d W(t)] , P u (r) = (1 - *) Po,iir) , (10) 

with constant market coefficients r\, /xi, and o\ > 0 after a possible crash of size k 
at time t. That is, this is the same market model as before the crash except that the 
market parameters are allowed to change after a crash has happened. 

It is important to keep in mind that the investor does not know for certain that a 
crash will occur — the investor only thinks that it is possible. An investor who knows 
that a crash will happen within the time horizon [0, T] has additional information 
and is therefore an insider. The set of possible crash heights of the insider is indeed 
Ki := [k* , k*], while the set of possible crash heights of the investor who thinks that 
a crash is possible is K := {0} U [ k *, k*]. In this paper, only the portfolio problem 
of the investor, who thinks a crash is possible, is considered. 

For simplicity, the initial market will also be called market 0, while the market 
after a crash will be called market 1. In order to set up the model, the following 
definitions are needed. 

Definition 1 (i) For i = 0,1, let Aiis,x) be the set of admissible portfolio 
processes n it) corresponding to an initial capital of v > 0 at time s, i.e., 
s < t < T }-progressively measurable processes such that 

(a) the wealth equation in market i in the usual crash-free setting 

d X*' s ' x it) = X?' s ' x it) [in + nit) [pa - n]) d t + 7t( 0M^(0] , (li) 
X*' s ' x is) = x (12) 


has a unique non-negative solution Xf ,s,x f) and satisfies 
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T 



s 


[n(t)XT’ s ’ x (t)f dr 


< oo P- a.s. , 


(13) 


i.e. Xf ,s,x (t) is the wealth process in market i in the crash-free world, which 
uses the portfolio strategy n and starts at time s with initial wealth v. 


Furthermore, Xf (t) := Xf ,0,x (t) will be used as an abbreviation, 
(b) 7 z(t) has left-continuous paths with right limits. 

(ii) the corresponding wealth process X n (; t ) in the crash model , defined as 


X n (t) 


Xq (t) fors < t < x 

[1 — 7 r(r)k] x^ ,z,X ° ^ (t) fort > r > s , 


(14) 


given the occurrence of a jump of height k at time r, is strictly positive. Thereby, 
it is assumed that the crash time r is a ^-stopping time. The set of admissible 
portfolio strategies is obviously given by Ao(s, x) as long as no crash happens. 
After a crash at time r, the set is given by A\ (r, x), which is defined scenario- 
wise, that is via A\(z(co), x ) for all weft. Hence, 

A(s,x) := 1 0(0 with t e [s, T ] : 0| [j>t] e A 0 O, x)\ [s z] and0| [T>r] g Ai(r,x)| . 

(iii) A(x) is used as an abbreviation for A(0, x). 

Finally, it is clear how to extend the definitions given above only for i = 0 to 
i = 1. Simply, replace the zeros by ones. 


3 Optimal Portfolios Given the Probability of a Crash 


In this section, let us suppose that the investor knows the probability of a crash 
occurring. Let p , with p e [0, 1], be the probability that a crash can happen (but 
must not necessarily happen) * 1 . Note that the following argument holds also for time- 
dependent p (that is pit)), however to simplify the notation, it is assumed that p is 
constant. In this situation, the optimization problem can be split up into two problems 
(crash can occur, no crash happens) which have to be solved simultaneously. To that 
end define for p e [0, 1] 


1 Observe that the important information is that no crash will happen with a probability of at least 

1 — p. If one would say that a crash will happen with probability p, the investor would become an 
insider with an adjusted optimization problem as described in Sect. 2, p. 9. However, this insider 
approach would make the discussion way more difficult. Therefore, to simplify the discussion, the 
approach of no crash happens/a crash can happen is taken here. 
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E p [In (r^-'cr))] := p E [in (y 7r ’ f ’' t (T))] + (I - p) E [in (X*' f ’*(r))] . 

A crash can occur. No crash happens. 

Using this definition, the optimization problem can be written as 


sup inf T Ep [in (z ?r ’ r ’ x (7’))] 

+ (l-p)E[ln (zJ’ f ’ JC (7’))] 
inf v\ X^’ t,x (r) (1 — jr(r)fe)^ 




sup 

n(-)eA(t,x) 


inf E\ln(X 7r ' t ’ x (T))] 

t<r<T, 

keK 


= sup 

7r(-)eA(7,x) 


t<x<T, 

keK 


+ (1 - p) Jv(t,X,7T) 


(15) 


Observe that the two extremes, p e {0, 1} are straightforward to solve: 

(A) p= 1: 

sup inf Ei [In (X n ^ x (T))] = sup inf E [in (X n,t ’ x (T ))] . 

n(-)eA(t,x)<f e f> ^)eA { t, x yf e 

Thus, this is the original worst-case scenario portfolio problem. The solution is 
already known. 

(B) p = 0: 


sup inf E 0 [In (X n ^ x {T))] = sup E [in (Xj’ ? ’ x (T))] , 
jr(-)eA(t,x) 7T(-)eA 0 (t,x) 

which is the classical optimal portfolio problem of Merton. The solution is well 
known and is given in our notation (compare with Eq. (2)) by 7 Tq . 

Let us now consider the case p e (0, 1). Denoting the optimal crash hedging 
strategy in this situation by fc p , Eq. (15) can be rewritten as 

J 0 {t, X, ftp) = p • Vl (t, X (1 - 7 Tp(t)k*)) + (1 - p) o (t, A, Ttp) 

(t, x, Ttp) = Vl (t, X (1 - 7Tp(t)k*)) , 

where the last equation is obtained from the first equation by solving the first equation 
for ^o- Since the latter equation is the indifference Eq. (5) in this setting, which leads 
to the same ODE and boundary condition as in Korn and Wilmott [9], it follows that 
7t p = it (see the paragraph between Eqs. (5) and (6) for details). This result shows 
that the crash hedging strategy remains the same even if the probability of a crash 
is known. Thus, this result justifies the wording worst-case scenario of the above- 
developed concept. This is due to the fact that the worst-case scenario should be 


278 


O. Menkens 


independent of the probability of the worst case and which has been shown above. 
Let us summarize this result in a proposition. 

Proposition 1 Given that the probability of a crash is positive , the worst-case sce- 
nario portfolio problem as it has been defined in Eq. (3) is independent of the prob- 
ability of the worst-case occurring. 

If the probability of a crash is zero, the worst-case scenario portfolio problem 
reduces to the classical crash-free portfolio problem. 


4 The q -quantile Crash Hedging Strategy 

Obviously, the concept of the worst case scenario has the disadvantage that additional 
information (namely the given probability of a crash and the probability distribution 
of the crash sizes) is not used. However, if the probability of a crash and the probability 
of the crash size is known, it is possible to construct the (lower) q -quantile crash 
hedging strategy. 

Assume that p(t) e [0, 1] is the given probability of a crash at time t e [0, T ] and 
assume that / (k , t) e [ 0 , 1 ] is the given density of the distribution function for a crash 
of size k e [&*,&*] at time t. Moreover, suppose that a function g : [0, T ] — > [0, 1] 
is given. With this, define 


' 0 


1 

IV 

inf 

kq 

kq : 1 - pit) + pit) f f (k, t ) dk > q(t) 

k* 

► if 1 — p(t) < q(t) and7T > 0 

sup 

k* 

kq : 1 - Pit) + Pit) f f(k, t ) dk > q(t) 

h 

if 1 — pit) < q(t) and7t < 0 


for any given portfolio strategy tv. This has the following interpretation. The prob- 
ability that at most a crash of size k q (t) at time t happens is q(t). Equivalently, the 
probability that a crash higher than k q (t) will happen at time t is less than 1 — q it). 
Obviously, this is a value at risk approach which relaxes the worst-case scenario 
approach. 

Notice that the worst case of a non-negative portfolio strategy is either a crash 
of size k * or no crash. On the other hand, the worst case of a negative portfolio 
strategy is either a crash of size k * or no crash. Correspondingly, the g -quantile 
calculates differently for negative portfolio strategies (see the third row) than for the 
non-negative portfolio strategies (see the second row). Furthermore, denote by 


{ 0 } 


if kq ( t ) = 0 

{0} U 

k*,kq(t) 1 

if k q (t) 7 ^ 0 and n > 0 

{0} U 

k q (r), k*\ 

if k q (t) 7 ^ 0 and ix < 0 


K q (t) := 


Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 


279 


Definition 2 (i) The problem to solve 


sup 

7 T(-)eA(x) 


inf ErinfX^T))] 

0<r<7\ L V /J 

keK q ( t) 


(16) 


where the terminal wealth X n (T) in the case of a crash of size k at time t is 
given by 

X n {T) = [1 - 7T(r)k] z^’ r,Z o (T) (: T ) , (17) 

with x^ ,r,X ° ^ (t) as above, is called the (lower) q -quantile scenario portfolio 
problem. 

(ii) The value function to the above problem is defined via 


W q (t,x) 


sup 


inf 

t<r<T, 
k€K q ( r) 


E [In (X^’Vr))] . 


( 18 ) 


(iii) A portfolio strategy Jt q determined via the equation 

w q (t, x ) = v\ (t, x (l — Jt q (t)k q (t))) for all t e [0, T ] with k q (t) > 0 

will be called a (lower) q- quantile crash hedging strategy. 

(iv) A portfolio strategy jt q is a partial (lower) q -quantile crash hedging strategy , 
if it is for any t e [0, T] either a g -quantile crash hedging strategy or a solution 
to the g -quantile scenario portfolio problem. 

It is straightforward to see that the 1 -quantile scenario portfolio problem is equiv- 
alent to the worst-case scenario portfolio problem given in Eq. (3). Moreover, the 
1 -quantile crash hedging strategy is equivalent to the crash hedging strategy in Def- 
inition 3.1 in Menkens [16, p. 602]. 

Remark 1 (i) Clearly, the definition given in Eq. (16) is different from the corre- 
sponding definition given in Sect. 3 and it leads only to the same solution in the 
two extreme cases of either p = 1 or p = 0. 

(ii) Notice that the g -quantile scenario portfolio problem is only a g -quantile con- 
cerning the crash. The randomness of the market movement represented in the 
model by a geometric Brownian motion has been averaged out, namely by taking 
the expectation — and not the g -quantile. 

Define the support of k q to be 


supp (k q ) := {t e [0, T] : kq(t) > 0} . 


Using this, it is possible to show the following. 

Theorem 1 Let us suppose that k q is continuously differentiable on supp (k q ) with 
respect to t. 
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(i) Then there exists a unique (lower) q-quantile crash hedging strategy Jt q , which 
is on supp (k q ) given hy the solution of the differential equation 


n' q (t) = (n q (t) - -j^f) \ y ~ ^o) 2 + ^1 - ^0 


fiq(T) = 0. 


■n q (t)k' q (t), (19) 

( 20 ) 


For t e [0, T ] \ supp (k q ) set fc q (t) := 7Tq. 

Moreover, if&i > ro, then the q-quantile crash hedging strategy is bounded by 

0 < TC q (t) < -f— < f fort e supp (kq) . 
kq\t) k* 

Additionally, if^i < and 7Tq > 0, the q-quantile crash hedging strategy has 

another upper bound with ft q (t) < n q — \ (^o — tf'i). 

V °b 

On the other side, if&i < ro the q-quantile crash hedging strategy is bounded 
by 

\ (*b - Vi) < fc q it) < 0 fort e [0, T). 
a o 

(7/) IfF i < F 0 awJ 7 Tq < 0, 7/iere exists a partial q-quantile crash hedging strategy 
TC q at time t (which is different from fc q ), if 

In (1—7 Tftk q (t)) ( v 

-S 9 (0 := T — — > 0 fort e supp (kq) . (21) 

With this, n q (t) is given by the unique solution of the differential equation 


^(0 = ( y q ( t ) - |^y fo (0 - %*) 2 + 


-n q (t)k' (t), 


n, 


(Sq(t)) = TTq. 


For S q (t) < 0 set n q (t) := 7Tq. This partial crash hedging strategy is 
bounded by 

- J ~0 (^0 - ^l) < H q (t) < 7 Tq < 0. 


If k q is independent of the time t, the optimal portfolio strategy for an investor, who 
wants to maximize her q-quantile scenario portfolio problem, is given by 
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7t q (t) := min [fc q (t), fc q (t), 7 Tq } for all t £ [0, T], (22) 

where 7t q will he taken into account, if it exists. ft q will also be called the optimal 
q -quantile crash hedging strategy. 

Remark 2 Let us write fc^it) (instead of fc q (0) to emphasize the dependence on k , 
whenever needed. It follows from Eqs. (19) and (20) that 


K(T) = -ym-roi 




oo if < ro 
0 if ipi = ro 
—oo if > ro 


(23) 


(a) First, observe that this implies that n q ( t ) = 0 if = ro, that is this is the only 
case where both the optimal g -quantile crash hedging strategy and the optimal 
crash hedging strategy are constant. That is, everything is invested in the risk-free 
asset if = ro. 

(b) Second, notice that < for k\ < &2- Hence, > ftk 2 with strict 
inequality applying on [0, T ). Thus, in particular, rt q (t) > n (t) for t G [0, T) 
for any q which satisfies q(t) < 1 for t G [0, T). 

(c) Third, for the remainder of this remark, let us consider only the case that < ^o 

and 7 Tq > 0 (the other cases follow similarly). In this situation, one has that 

ftk(t) < 7Tq — / \ (V 0 ~ Thus, it is clear that 

V a o 


t(t) 


0 for t = T 

- /5 (^0 - ^i) else 
V a o 


is an upper bound for any ftk with k > 0. It follows that 


7ik(t) — > t j/(t) pointwise fork | 0 withk 7 ^ 0 , 


because of the convergence (23). Finally, keep in mind that the case k — 0 yields 
7 Tq as the optimal portfolio with 7 Tq ^ \js. An example is given in Fig. 4. 


Proof ( of Theorem 1) If k q (t) is constant in t this theorem follows from Theorem 
4.1 in Korn and Wilmott [9, p. 18 1], (for generalizations of this theorem see either 
Theorem 4.2 in Korn and Menkens [10, p. 135] or Theorem 3.1 in Menkens [16, 
p.603]) by replacing k* with k q . To verify the differential equation in the general 
case, keep in mind that by differentiating the — modified — Equation (A. 5) in Korn and 
Wilmott [9, p. 1 83] (or Eq. (3.1) in Menkens [16, p.602]) with respect to t, k q (t) has 
also to be differentiated with respect to t. This leads to the differential equation (19). 


282 


O. Menkens 



Fig. 4 Example of k — > 0 for V \ = and 7 Tq > 0 


5 Examples 


5.1 Uniformly Distributed Crash Sizes 

Suppose that the crash time has probability p(t) = p and that the crash size is 
uniformly distributed on [ k *, &*], that is 


/ (k, t) = 

Using the defining equation for k q , that is 


for ke[k*,k*] 
0 otherwise 


1 ~ P + P 


1 


k* -L 


dk = q 


this leads to the following equation for k q 

t, = t . + £±£zi 


[** - *,] 
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For q = 1, we get the worst case back, that is k\ = £*, as constructed. 


5.2 Conditional Exponential Distributed Crash Sizes 


Assume that the crash sizes are exponential distributed on the interval [k t , k*\ This 
means that 


f(k, t ) 


0 Xk% ^ Xk* 


0 


for it e [k*,k*] 
otherwise 


With this, k q calculates to 

k q = -i In — — — — |V U * - e~ xk * j + e~ xk *^ . 


Again, k q = k* for q = 1. 


5.3 Conditional Exponential Distributed Crash Sizes 
with Exponential Distributed Crash Times 

Suppose that not only the crash height has a conditional exponential distribution, 
but also the crash time has a conditional distribution, independent of the crash size, 
that is 


1 — e 9t 

pit) = q + (p- q) x _ q _ ot 


(24) 


This means that the probability of a crash happening is moving from q if t = 0 to 
p if t = T in an exponential decreasing way if q > p. The defining equation of k q 
writes in this case to 


1 — e et 

1 -q-tp-q)——^ 


1 — e 6t " 
9 + (P~9) l _ e _ 0T 


e -Al* _ e Xk q (t) 
q—X . k* _ Q—Xk* 


This gives 


[1— 4][1_ e -er][ e -u, _ e -«*]\ 
q [l - e -er ] + (p - q) [1 - t~ 6t \ J 

Clearly, this is an example where k q depends on the time t. Its derivative calculates 
to 


knit) = “T ln 


e“ u * + 
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Investment Horizon T-t 


Fig. 5 The range of (optimal) g -quantile crash hedging strategies for tfq = and tTq > 0. This 
graphic shows Jtk* {solid line), tc ^ {dotted line), the range of possible optimal ^-quantile crash 
hedging strategies (grey area) if k q is constant, and 7 Tq {solid straight line). The dash-dotted line is 
a uniform distributed example (see Sect. 5.1), the dashed line is an exponential distributed example 
(see Sect. 5.2), and the dotted line is a time-varying example (see Sect. 5.3). 


d kn _ 

>> - “ 


-Q?-g)[l-g]fle 9t [l-e 9T ] r -u* _ Q -kk* 
1 [q[\-e- 9T ]+(p-q)[l-e- et ]f L 


~-kk* 


[l-q][l-e-QT] 


q[l-e~ eT ]+{p-q)[l-e 
1 


-on [ e 


—kk* _ P -k&*] 


X [« [1 • 




] + {p-q)[ l-e et ]\ 


ip-q)W -q]0e 


-0t 


[l-e-^][e-«* 


'] 


{p-q)[ 1 - t~ et ] Q~ xk * + [1 - e~ 6T ] [e“ u * [1 - q] - e~ xk * [1 - 2 q]\ ' 


Figure 5 shows the potential range of the optimal g -quantile crash hedging strategy 
(the gray shaded area) if k q {t) ^ 0 is constant. Obviously, in the case of k q {t) = 0, 
one has that n q (t) = 7Tq (that is the optimal strategy is to invest according to the Mer- 
ton fraction). Moreover, if k q (; t ) is not constant, it can happen that the corresponding 

g -quantile crash hedging strategy moves outside the given range. However, this usu- 

d& 

ally happens only if the derivative is sufficiently large — which is not the case 
for most situations. The parameters used in these figures are k* = 0.5, k* = 0.1, 
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7 Tq =0.75, and ao = 0.25. Additionally, the examples discussed above are plotted 
for the choices of p = 0. 1 and q =0.95. The dashed line is the example of a uniform 
distribution with p = 10% and q = 5 %. The dash-dotted line is the example of 
an exponential distribution with the additional parameter A = 10 (where the other 
parameters are as above) and the dotted line is the example of a time varying crash 
probability given in Eq. (24) with the additional parameter 0 = 0.1. Notice that the 
first two examples lead to similar strategies as in Korn and Wilmott [9], just that 
k* is replaced by k q , which is constant in those two examples. The third example 
is clearly different from that. Starting with an investment horizon of T = 50 years, 
the optimal strategy is to increase the fraction invested in the risky asset up to an 
investment horizon of about 30 years. This is due to the fact that the probability of a 
crash happening is 95 % at T = 50 and it is exponentially decreasing to 10 % as the 
investment horizon is reached. 


6 Deterministic Portfolio Strategies 

Definition 3 Let jx be an admissible portfolio strategy. 

7 r d (t) := E \ir(t)] for all t £ [0, T] 

will be called the (to it) corresponding deterministic portfolio strategy. 

If 7 Xd — 7T , then ltd is admissible because it is admissible. If itd(t) / it(t) for 
some t £ [0, T], then there exist co\, a >2 £ which depend on t, such that 


It (t, COl) < 7 td(t) < It (t, CO 2 ) . 

Thus, ltd is bounded and therefore admissible. 

Definition 4 Let us define 

kj r(0 •— k • ll {7r( o>0} + k * • ll{7r (/) <0} • 

Lemma 1 Let it be an admissible portfolio strategy. Then the corresponding deter- 
ministic portfolio strategy to it yields in the initial crash-free market at least the 
same expected utility of terminal wealth as it. If additionally it(t) < p holds for 
all t £ [0, T], then ltd yields in the initial market with a possible crash at least the 
same worst case expected utility of terminal wealth as it. 

Remark 3 This Lemma is important because often an optimization problem is solved 
only on the set of deterministic strategies (see e.g., Korn and Wilmott [9], Korn and 
Menkens [10], or Christiansen [6]) and not on the set of stochastic strategies (which 
include the deterministic ones). This is done because it is often simpler to solve the 
optimization problem on the set of deterministic strategies. 
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Proof (of Lemma 1 ) Using the Theorem of Fubini, one has for any admissible port- 
folio strategy ic 


i ( t , x, 7T ) = ln(v) + E 


tf'o - Y (^0) - ^o) 2 dj 


1 2 

= ln(x) + J % - ^-E (tt (s) - ^o) 2 j 


ds 


= ln(x) + I (E[7r(s)] - tt^) 2 - ^-Yar(n(s)) ds 


ln(v) 


1 2 2 

+ J* 0 - y Ms) - ^o) 2 - yVar(^(5)) ds 


7 

/' 


= (f, x, Jt d ) - -r- / Var(7r (s)) ds 


< ,/ 0 (f, x, Jt d ) . 


This is the case if no crash happens. In the case that a crash has happened, one gets 
with the definition 


(f) := In (1 - E [n(t)] k nd (t)) - E [In (1 - n{t)k n (r))] 

the following 

vi (t, x (1 - n(t)k n {t))) = ln(x) +E[ln (1 - n (t)k n {t))] + tffi (T — t) 

= ln(x) + In (1 - 7Td(t)k nd (t)) + (T - t) - A n (t) 

= vi (t, x (1 - Tt d (t)k nd (t)]) - Ajr(t) 

< vi (t,x( 1 - TT d (t)k nd (t ))) , 

where it has been used for the last inequality that A n (t) > 0. However, this is Jensen’s 
inequality which holds if 1 — n(t)k n (t) > 0. The latter holds for 7t(t) < p, which 
is the assumption. This proves the assertion. 

Remark 4 The condition n(t) < p is natural if a crash of size k* can happen, 
because it avoids that the investor can go bankrupt. Since k* < 1, the condition 
means that the investor is not allowed to be too much leveraged. 
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7 Conclusion 


It has been shown that the worst-case scenario approach of Korn and Wilmott [9] will 
not make use of additional probabilistic information of a crash happening. This is 
overcome by introducing a g -quantile approach which is a Value at Risk ansatz to the 
worst-case scenario method. Examples are given; in particular, one extreme example 
shows that it is possible with the g -quantile approach to obtain optimal portfolio 
strategies which are first increasing and then decreasing. Finally, it is shown that any 
stochastic portfolio strategy will give a lower expected utility of terminal wealth (or 
a lower worst-case scenario bound) than the corresponding deterministic portfolio 
strategy (defined by taking the expectation of the stochastic portfolio strategy)). 
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Improving Optimal Terminal Value 
Replicating Portfolios 


Jan Natolski and Ralf Werner 


Abstract Currently, several large life insurance companies apply the replicating 
portfolio technique for valuation and risk management of their liabilities. In [7], the 
two most common approaches, cash-flow matching and terminal value matching, 
have been investigated from a theoretical perspective and it has been shown that 
optimal terminal value replicating portfolios are not suitable to replicate liability 
cash-flows by construction. Thus, their usage for asset liability management is rather 
restricted, especially for out-of-sample cash profiles of liabilities. In this paper, we 
therefore enhance the terminal value approach by an additional linear regression of 
the corresponding optimal dynamic numeraire strategy to overcome this drawback. 
We show that terminal value matching together with an approximated dynamic strat- 
egy has in-sample and out-of-sample performance very close to the optimal cash- 
flow matching portfolio and, due to computational advantages, can thus be used as an 
alternative for cash-flow matching, especially in risk and asset liability management. 


1 Introduction 

In the last years, market consistent valuation has become the standard approach 
toward risk management of life insurance policies, see for example [3]. Due to the 
complexity of life insurance contracts, most academics and practitioners resort to 
Monte Carlo methods for valuation purposes. However, the difficulty is to find a 
computationally efficient yet sufficiently accurate algorithm. For instance, contracts 
may include surrender options, which allow the policy holder every year to cancel 
the contract and withdraw the value of her account. In this context, [1,2] and several 
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other authors therefore resort to the well-known least squares Monte Carlo approach, 
which was originally introduced by [6] to price American options. In contrast, [9] first 
suggested valuation of with-profits guaranteed annuity options, which are typical life 
insurance products, via static replicating portfolios. To hedge against interest rate risk, 
a portfolio is built of vanilla swaptions and a remarkably good fit of the market value 
of annuity options is obtained. The purpose of constructing a replicating portfolio 
is to approximate the liability cash-flows of an insurance company by a portfolio 
formed by a finite number of selected financial instruments. If the approximation 
is accurate, one obtains a good estimate of the market value of liabilities from the 
fair value of the replicating portfolio. In current literature, two portfolio construction 
approaches stand out. The first one aims to match liability cash-flows and cash-flows 
of the replicating portfolio at each time point. The second one is less restrictive as 
it only demands that accrued terminal values of the cash-flows match well at some 
final time horizon T . 

For risk purposes, insurance companies want to compute the fair value of their 
assets and liabilities, i.e., the market consistent embedded value (MCEV) under 
shifted market conditions now or one year in the future. More precisely, having 
found a replicating portfolio which matches the fair value of liabilities under current 
market conditions, one performs instantaneous shocks on known parameters (such 
as volatility, forward rate curve, etc.) and checks if fair values are still matched. This 
is commonly referred to as a comparison of sensitivities between the fair value of the 
replicating portfolio and the fair value of liabilities. If sensitivities are similar, it is 
usually assumed that fair values will be roughly matched one year in the future even 
if rare events in the 99.5 % quantile take place. For instance, this is the motivation 
for [4] to put additional constraints in the optimization problem to guarantee that 
fair values are close to one another under various stress scenarios. Figure 1 illustrates 
the dependence between initial asset prices and the fair value of liabilities and a 
replicating portfolio. It can be observed that fair values are close to each other and 
behave quite similar, but not fully identical. 

For the purpose of improving terminal value matching, we start with the setup 
as given in [7], that is, we consider the cash-flow matching problem and the termi- 
nal value matching problem as proposed in [9] and [8], respectively, and relax the 
requirement of static replication by allowing for dynamic investment strategies in 
the numeraire asset. We briefly review the theoretical results derived therein, before 
we investigate in more detail the benefit of our approach based on market scenarios 
generated by an insurance company: First, we compare the in-sample and out-of- 
sample performance of the two replicating portfolios. Then, in the main contribution 
of this article, we take a closer look at the optimal dynamic investment strategy and 
approximate it by a time-dependent linear combination of the replicating assets. In 
our particular example, the approximation turns out to be remarkably accurate as 
in-sample and out-of sample tests will show. 
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xIO 



Fig. 1 Fair value of liabilities and of the replicating portfolio depending on initial asset prices 


2 The Mathematical Setup 

This setup roughly follows that of [3]: Let (£?, , Q) be a filtered prob- 

ability space 1 in discrete time 3* := {t = 0, 1 . . . , T] with risk-neutral measure Q. 

On this probability space, we introduce a frictionless, arbitrage-free financial market 

as follows. 

• Let (Rf) t e S? be Markovian financial risk factors (e.g., interest rates). 

• Let (R^) t e R l ,t g & be Markovian risk factors, independent of {Rt) te <?> 
affecting the liabilities of an insurance company (e.g., mortality rates). 

• (Rf) te ^ and generate filtrations ] F ) te and respectively. 

We assume & t = v ^ L , 'it e ST . 

• There are m financial assets, an R^ 1 -valued (& F ) -adapted process (e.g., 

risk factors, book values, moving averages, etc.) and a function C F : { 1 , . . . , T] x 
R^ 1 R for each asset i such that C F (t, D F ) is the cash payment of asset i at 
time t . At T, this cash payment represents the remaining value of the asset. D L and 
C L are defined analogously; however, liabilities may also depend on the financial 
risk factors D F , i.e., C L = C L (t , D F , D F ). 


1 Similarly to [7] we assume that all technical requirements are fulfilled (square integrability, 
completeness of filtration, ...). 
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• (A denotes the numeraire (with initial value No = 1, paying no intermediate 
cash-flows) which is used in the dynamic investment strategy. We assume that 
Nt is paid as a cash-flow at the final time horizon. For convenience, let us write 
Cq ( t , for the cash payment of the numeraire at time t , that is 


C 


F 

0 



0, t = 1, ..., T - 1 
N t , t = T 


Next, we review the two most commonly used approaches for the construction of a 
replicating portfolio. 


3 The Theory of Replicating Portfolios 


3.1 Cash-Flow Matching 


One possibility proposed by [9] is to look for a portfolio (c^, . . . , c^ t ) e M m+1 , 
which solves the optimization problem 2 


T 

min / 


/ 

" C L {t,D F t ,D L t ) 

m 

-I>‘ 

i= 0 

C F i(t,D F t )~ 

2 \1 

l 

N t 

N t 

)\ 


(RPcf) 


The objective function penalizes the difference between two cash payments at each 
time t . The role of the discounting factor 1 /N t is to assign equal weight to mismatches 
of equal size in terms of their discounted value. An alternative approach is discounted 
terminal value matching. 


3.2 Discounted Terminal Value Matching 

The terminal value of a cash-flow is obtained by summing all cash payments accrued 
to the terminal time T with the risk-free interest rate. By discounted terminal value, we 
mean the accrued terminal value discounted to the present. In mathematical notation, 
the discounted accrued liability cash-flow and the discounted accrued cash-flow of 
a replicating portfolio a = (a 0 , . . . , a m ) e M m+1 are given by 


2 The existence of a minimum has been shown in [7]. 
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7L X -1 C L (/’ D f ’ D t * ) 

= h * ’ 

T 


A F (a) := £ 


TfMf) 


f=l Li=0 




The observation that although two cash-flows may have entirely different cash pay- 
ment profiles, they still have the same fair value, leads to the alternative optimization 
proble 3 


min 

aeM m+1 



(RPfv) 


Originally, this problem was introduced by [8] with the difference that they con- 
sidered non-discounted terminal values. 


4 Equivalence of Cash-Flow Matching and Discounted Terminal 
Value Matching 


Next, we recall the connection between (RPcf) and (RP XV ) as established in [7]. If 
the numeraire asset can be bought or sold at any time, problems (RPcf) and (RP XV ) 
are practically the same. The brief explanation is that cash-flow mismatches can be 
laid off by an appropriate strategy in this asset. These mismatches then sum up to 
the discounted terminal value mismatch and thus problems (RPcf) and (RP X ~ V ) are 
intimately linked. 

In more detail, suppose that the insurance company is allowed to invest and finance 
cash-flows from trading the numeraire asset at all times t = 1 , ,T. Define the 
following linear space of processes 


£0 = 


: Vf = 1, . . . , T — 1, 8, e 



Any process from this space represents an adapted strategy of investments in or 
borrowing from the numeraire asset, so 8 t is the number of assets bought or sold 
short at time t. Here, 8 t > 0 is interpreted as a purchase, which corresponds to a 
negative cash-flow for the insurer and 8 t < 0 as a sale, which corresponds to a positive 
cash-flow. The condition X/Li = 0 ensures that strategies have zero discounted 
terminal value. Note that strategies from are not necessarily predictable. At each 
time point, the insurer can incorporate all information available at that time to make 
a decision on the trade 8 t . Only at T is the insurer bound to clear the balance thus 
making 8 t predictable. 


3 For the existence of a minimum, see [7]. 
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The introduction of such strategies turns out to be the key link between problems 
(RPcf) and (RP T y): The discounted terminal value A F (a, 8) corresponding to an 
investment strategy ( a , 8) with a e M m+1 , 8 e srf is given by 

A F (a,8 ) = A F (a), 


where 


(a, 8 ) := ^ 


z< 


t = l u=0 


• crjt.pn 

N, 


T 




In other words, the discounted terminal value only depends on the initial portfolio 
... ,a m in the assets. Thus, we write A F (a) instead of A F (a, 8). We say 
that two investment strategies (a, 8) and (j 3 , <5) with a, /3 e M m+1 , 8,8 e srf are 
FV-equivalent iff 


Note that due to the above, initial portfolios of two FV-equivalent investment strate- 
gies have equal fair value, as they produce identical discounted terminal values. 

Based on the extension from static portfolios to partially dynamic strategies, we 
define corresponding optimization problems, 


inf 

a eR m+1 , <?>€£/ 


|K[ 


C L (t,D F ,,D L t ) 


N, 


, m 

& 


, C F i (/. D F ,) 
N, 


-)D] 


(GRPcf) 


the generalized cash-flow matching problem and 


inf 

aeR m+1 ,8eg/ - 





(grp tV ) 


the generalized discounted terminal value matching problem. Based on the following 
two additional weak assumptions, the main results follow. 


Assumption 1 The matrix ( Q F ) is positive definite, where 



with discounted terminal value Af of the cash-flow generated by asset i given as 
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Assumption 2 Let a opt = ^o pt , a* pt , . . . , a ™ pt j be the solution to (RP T ~ V ). The 

cash-flow mismatch C l (T,Dj,Dj) — 2/Lo a opt^f (T,Dj) is not & t-\~ 
measurable. 

The following properties of the two optimization problems and their connections 
were derived in [7] . 

1. Properties of (GRP TV ) and the relationship to (RP TV ): 

a. Under Assumption 1, the solution to (RP T ~ V ) exists, is unique and given by 

a opt = E^ ( Q f ) 1 E^ The set of solutions to (GRP T y) is the 

FV-equivalence class of the solution to (RP T ~ V ). 

b. The optimal value of (GRP T ~ V ) is equal to the optimal value of (RP T ~ V ). 

2. Properties of (GRPcf) and the relationship to (RP TV ), (GRP TV ) and (RPcf)* 

a. Under Assumptions 1 and 2, the solution to (GRPcf) exists and is unique 
with initial portfolio given by the solution to (RP T ~ V ) and strategy 8 e srf 
such that cash-flows are perfectly matched at times t = l, . . . , T — 1 . 

b. Under Assumption 1, the set of solutions to (GRP T ~ V ) is the equivalence 
class of the solution to (GRPcf)- 

c. The optimal value of (GRPcf) is smaller than or equal to the optimal value 
of (RPcf)- Under Assumptions 1 and 2, equality is achieved iff for times 
t = 1 , . . . , T — 1 the liability cash-flow is perfectly replicated by the cash- 
flow of the portfolio solving (RP T ~ V ). 

3. Fair values of (GRP TV ) and (GRPcf): 

a. The fair value of the solutions to (GRP T ~ V ) and the fair value of the solution 
to (GRPcf) are equal to the fair value of the liability cash-flow. 

The main drawback of the generalized terminal value approach lies in the intro- 
duction of the dynamic strategy in the numeraire asset: as the optimal 8 t depend on 
the liability cash-flow (see Property 2.c above), this strategy is not available out- 
of-sample to reproduce (unknown!) liability cash-flows. Although the main purpose 
of replicating portfolios in risk management is fair value replication, asset liability 
management usually requires cash-flow replication as well. 

Therefore, the optimal numeraire strategy has to be estimated based on available 
information up to time t, which then in turn allows a reproduction of liability cash- 
flows, even in a terminal value approach. The most simple approach toward this end 
is a standard linear regression of the optimal 8 t against the information available in 
time t. Besides the obvious usage of prices of financial instruments as explaining 
variables, any further available information (e.g., non-traded risk factors like interest 
rate, etc.) could in theory be used for this purpose. 

Starting with the portfolio solving (RP T ~ V ), we compute (8 t ) t = such that 
cash-flows are perfectly matched in-sample except for T. The idea is to approximate 
8 t , t = 1 , . . . T — 1 by an ordinary linear regression, that is 
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St (a) := a t 


C F 
l U CA 


(t’D?) 2 C SP O' D 0 


N, 


+ Cl, 


N t 


+ a. 


C F 
3 NK 


(•■ ° ■) 


N t 


t = 1, 


, T - 1, 


r-i 

: = ~ ^ 
f=l 


where a e M r 1x3 solves the problem 


T 


min 

aeR T ~ l 


3 X 


/ C L q,D f f ,D L f ) 

I w, 


Z‘ 
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In other words, we solve (GRPcf) with , a™ pt fixed and optimal for (RP T ~ V ) 

and5 f restricted to have the form above. Note that the parameters (a } , af, ^ 3 )?=i,...r-i 
are known to the insurer at present. The hope is that the portfolio obtained from match- 
ing discounted terminal values together with dynamic investment strategy C St)t=l,...,T 
will produce at least a similar out-of-sample objective value as the static portfolio 
obtained from solving (RPcf)- 


5 Example 


Based on financial market scenarios provided by a life insurer, we carry out some 
numerical analysis to compare the performance of the portfolios solving (RPcf) and 
(RP t ~ v ). The results above imply that in an in-sample test the terminal value technique 
will outperform the cash-flow matching technique. On the other hand, it is not clear 
what happens in an out-of-sample test. This will also depend on the robustness of 
both methods. 

Since scenarios for liability cash-flows were unavailable, we implemented the 
model proposed by [5] . A policy holder pays an initial premium Po, which is invested 
by the insurer in a corresponding portfolio of assets with value process (A t ) te ^. 
The value of the contract (L t ) te ^ now evolves according to the following recursive 
formula. 


Lt + 1 — L t 


^1 + max 




t = 0,1,..., 


where re is the interest guaranteed to the policy holder, p is the level of participation 
in market value earnings and y is a target buffer ratio. 

To generate liability cash-flows, we assumed that starting in January 1998 the 
insurance company receives one client every year up to 2012. Each client pays an 
initial nominal premium of 10.000 Euros. All contracts run 15 years. At maturity, 
the value of the contract is paid to the policy holder generating a liability cash-flow. 
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The portfolio in which the premia are invested consists of the Standard and Poors 
500 index, the Nikkei 225 index and the cash account. We normed the values of the 
cash account, the S and P 500 and the Nikkei 225 so that all three have value 1 Euro 
in year 2012. Every year the portfolio is adjusted such that 80% of the value are 
invested in the cash account and 10 % are invested in each index. 

For the construction of the replicating portfolio, we chose the same three assets. 
Cash-flows are generated by selling or buying assets every year. Since we are con- 
structing a static portfolio, the decision how many assets will be bought or sold each 
year in the future has to be made in the present. Hence, one may regard the replicating 
assets as 3 x 15 call options with strike 0, one option for each index/cash account 
and each year. We chose the cash account as the numeraire asset in the market. 

In order to make the evolution of contract values more sensitive to changes of 
financial asset prices, we assumed a low guaranteed interest rate of re = 2.0 %, a 
high participation ratio p = 0.75 and a low target buffer ratio y = 0.05. From 1,000 
scenarios, 4 we chose to use 500 for the construction of the replicating portfolios and 
the remaining 500 for an out-of-sample performance test. The portfolio is constructed 


Table 1 Optimal replicating portfolios (in thousand Euros) for problems (RPcf) and (RP T ~ V ) and 
their fair value 



Cash-flow match 

Terminal value match 

Year 

Cash account 

S&P 

Nikkei 

Cash account 

S&P 

Nikkei 

Total initial position 

173.010 

2.688 

0.886 

178.091 

11.174 

-12.047 

Fair value 

176.6 

177.2 

2012 

14.089 

0 

0 

0 

3.062 

-5.530 

2013 

13.518 

0.042 

-0.005 

0 

2.739 

-2.969 

2014 

13.141 

0.052 

-0.125 

0 

4.511 

-8.693 

2015 

12.757 

0.171 

-0.091 

0 

-1.790 

3.402 

2016 

12.719 

0.237 

-0.137 

0 

-0.745 

-3.480 

2017 

12.728 

0.261 

-0.038 

0 

-0.093 

-0.459 

2018 

11.685 

0.250 

0.046 

0 

-1.600 

4.009 

2019 

11.300 

0.304 

0.081 

0 

1.269 

-8.352 

2020 

10.964 

0.212 

0.065 

0 

0.123 

7.723 

2021 

10.606 

0.196 

0.131 

0 

1.013 

2.730 

2022 

10.308 

0.208 

0.169 

0 

2.139 

-1.091 

2023 

10.155 

0.269 

0.287 

0 

2.179 

-1.842 

2024 

9.847 

0.184 

0.191 

0 

-1.522 

4.333 

2025 

9.645 

0.152 

0.143 

0 

0.828 

-0.190 

2026 

9.549 

0.150 

0.168 

178.091 

-0.939 

-1.637 


The sample fair value of liabilities for the first 500 scenarios is 1.76 x 10 5 Euros 


4 As scenarios were provided by a life insurance company, only this restricted number of scenarios 
was available. Scenario paths for the Nikkei and the S&P indices as well as the cash account 
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Table 2 Values of the objective function in (RPcf) for optimal portfolios to (RPcf) and (RP T ~ V ) 
relative to the fair value of liabilities 



In-sample (%) 

Out-of-sample (%) 

Cash-flow 

8.72 

9.23 

Terminal value 

193.2 

192.8 


in year 2012. Tables 1 and 2 show optimal portfolios and the magnitude of in-sample 
and out-of-sample mismatches. The numbers in Tablet show which quantity (in 
thousands) of each asset should be bought or sold at the end of each particular 
year and the total initial position in year 2012. For the mismatches in Table 2, we 
computed the objective value of the cash-flow matching problem for both portfolios 
in-sample and out-of sample and divided by the fair value of liabilities. Therefore, 
these numbers can be viewed as a relative error. 

It needs to be noted that in the terminal value matching problem, all strategies 
concerning purchases and sales of the cash account lead to the same objective value. 
Hence, the terminal position of 178.091 could have been spread in all possible man- 
ners over the years 2012-2026 without any difference. 

As one may have expected, the replicating portfolio obtained from discounted 
terminal value matching very badly matches cash payments in particular years since 
these mismatches are not penalized by the objective function of the discounted ter- 
minal value matching problem. Consequently, a replicating portfolio obtained from 
terminal value matching is of little use to the insurer if cash payments are supposed 
to match well at each point in time. As already explained, the missing remedy is 
to employ an approximation of the appropriate dynamic investment strategy in the 
numeraire asset. 

We implemented the linear approximation of the optimal dynamic investment 
strategy as outlined at the end of Sect. 4 for the same scenarios that 
were used for the portfolio optimizations (see Fig. 2). Table 3 shows the optimal 
parameters (aj , af, a 2 ) t = and the coefficients of determination R 2 . 

On first sight, it is striking how large the coefficients of determination ( R 2 ) are (on 
average above 80%). However, since the optimal 8 t is a linear 
combination of discounted financial cash-flows C£ A (t, Df 7 ) / N t , C|p (t, D[) /N t 
and C^ K (t, /N t and discounted liability cash-flow C L (t, Df , D r L ) /N t , this is 
not too surprising. Actually, if liability cash-flows were known, i.e., available for the 
regression, a perfect fit (i.e., R 2 = 100 %) would be obtainable. In all other cases, 
the liability cash-flow is approximated by the asset cash-flows rather well. 

Analogous to Table 2, Table 4 shows the in-sample and out-of-sample objective 
function values for the portfolio solving (RPcf) and the portfolio solving (RP T ~ V ) 


(Footnote 4 continued) 

were generated with standard models from the Barrie and Hibbert Economic Scenario Generator 
(see www.barrhibb.com/economic_scenario_generator). 
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Fig. 2 The bar chart shows cash-flows of liabilities and the optimal terminal value replicating 
portfolio with and without a dynamic correction in the first ten years 


together with dynamic investment strategy (St)t=i,...,T relative to the fair value of 
the liabilities. 

Clearly, the dynamic strategy in the replicating assets significantly improves the 
quality of the cash-flow match. Yet, the optimal portfolio for cash-flow matching still 
slightly outperforms this dynamic variant due to the reasoning given above. 

We also regressed with additional in-the-money call options on the cash-flows, 
but there was only a negligible improvement in-sample and out-of-sample. Possibly, 
one may achieve better results with a more sophisticated choice of regressors, but that 
seems unlikely or at least challenging given the high coefficients of determination. 
Further, all results obtained above have been tested to be quite stable when changing 
the number of scenarios or changing the specific choice of liabilities. Of course, a 
more detailed analysis based on a real-world example could provide further valuable 
insights. 
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Table 3 Parameters (in thousands) obtained from linear regression and the coefficients of deter- 
mination 


Year 

a 1 

a 2 

a 3 

R 2 

2012 

1.4089 

-2.3976 

0.6460 

1 

2013 

1.3634 

-1.9468 

0.3073 

0.96 

2014 

1.3361 

-3.2633 

0.9379 

0.99 

2015 

1.2956 

1.6990 

-0.4615 

0.87 

2016 

1.2954 

0.8510 

0.3424 

0.91 

2017 

1.2903 

0.3433 

0.0117 

0.18 

2018 

1.1768 

1.4854 

-0.4825 

0.83 

2019 

1.1267 

-0.6920 

0.9790 

0.96 

2020 

1.0790 

0.1306 

-0.8758 

0.94 

2021 

1.0380 

-0.5618 

-0.2773 

0.69 

2022 

1.0143 

-1.4692 

0.1697 

0.61 

2023 

1.0078 

-1.4899 

0.2622 

0.78 

2024 

0.9846 

1.3168 

-0.4793 

0.93 

2025 

0.9706 

-0.5706 

0.0376 

0.43 


Table 4 Values of the objective function in (RPcf) for the optimal portfolio to (RPcf) and the 
optimal portfolio (RP T ~ V ) with strategy (St)t=\,...,T relative to the fair value of liabilities 



Cash-flow (%) 

T.V. w. correction (%) 

In-sample 

8.72 

10.16 

Out-of-sample 

9.23 

11.05 


6 Conclusion 

Motivated by the theoretical results in [7], we improved the cash-flow matching 
quality of the optimal terminal value portfolio without deterioration of the terminal 
value match. This is achieved by the introduction of a deterministic strategy (e.g., in 
replicating assets or risk factors) which approximates the optimal non-deterministic 
strategy. It turned out that with the dynamic correction the terminal value matching 
technique is comparable (but still slightly inferior) to the static cash-flow matching 
technique in terms of in-sample as well as out-of-sample performance. Due to the 
high coefficients of determination, a significant improvement by a more selected 
choice of explaining variables seems unlikely. Taking into account that in contrast to 
cash-flow matching, terminal value matching has an explicit analytic solution and that 
the least squares problems involved in the approximation of the dynamic strategy 
are also numerically negligible, this might thus represent a computationally more 
efficient alternative to the standard cash-flow matching approach. Further evidence 
can only be obtained by the careful examination of a real-world scenario. 
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Abstract Computation is based on models and applies algorithms. Both a model 
and an algorithm can be sources of risks, which will be discussed in this paper. The 
risk from the algorithm stems from erroneous results, the topic of the first part of 
this paper. We attempt to give a definition of computational risk , and propose how 
to avoid it. Concerning the underlying model, our concern will not be the “model 
error”. Rather, even the reality (or a perfect model) can be subjected to structural 
changes: Nonlinear relations of underlying laws can trigger sudden or unexpected 
changes in the dynamical behavior. These phenomena must be analyzed, as far they 
are revealed by a model. A computational approach to such a structural risk will be 
discussed in the second part. The paper presents some guidelines on how to limit 
computational risk and assess structural risk. 

Keywords Computational risk • Structural risk • Accuracy of algorithms • 
Bifurcation 

Mathematical Subject Classification 91B30 • 91G60 • 65Y20 • 65P30 


1 Computational Risk 


Early computer codes concentrated on the evaluation of special functions. The empha- 
sis was to deliver full accuracy (say, seven correct decimal digits on a 32-bit machine) 
in minimal time. Many of these algorithms are based on formulas of [1, 6]. Later 
the interest shifted to more complex algorithms such as solving differential equa- 
tions, where discretizations are required. Typically, the errors are of the type CA P , 
where A represents a discretization parameter, p denotes the convergence order of the 
method, and C is a hardly assessable error coefficient. A control of the error is highly 
complicated, costly, and frequently somewhat vague, and is source of computational 
risk. 
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This first part of the paper discusses how to assess the risk from erroneous results 
of algorithms. Accuracy properties of algorithms will have to be reconsidered. 


1.1 Efficiency of Algorithms 

The performance of algorithms can be well compared in a diagram depicting the 
costs (computing time) over the achieved relative error. In case, the output of an 
algorithm consists of more than one real number, then we think of the largest of all 
these errors. Now, for a certain computational task, select and run a set of algorithms, 
and enter the points representing their performance into the diagram. Schematically, 
the dots look as in Fig. I. 1 

For nontrivial computational tasks, there will be hardly a method that is simulta- 
neously both highly accurate and extremely fast; there is always a trade-off. Hence, 
one will not find algorithms in the lower left corner, below the curve in Fig. 1. This 
(smoothed) curve is the efficient frontier. It can be defined in the Pareto sense as min- 
imizing computing time and maximizing accuracy. Clearly, the aim of researchers is 
to push the frontier down; the curve is not immutable in time. The smoothed frontier 
in Fig. 1 may serve as idealized vehicle to define efficiency: Each method on the 
frontier is efficient. 

This notion of efficiency allows to define the “best” algorithm for a certain task 
almost uniquely. A reasonable computational accuracy must be put into relation to 
the underlying model error. So, indicate the size of the model error on the horizontal 
axis, and let a vertical line at that position cut the efficient frontier, which completes 
the choice of the proper algorithm. Of course, the efficient frontier is a snapshot that 
compares an artificial selection of algorithms. 



Fig. 1 Costs (computing time) of algorithms over relative error 


1 An example of such a diagram for the task of pricing American- style options is, for example, 
Fig. 4.19 in [14], however, for the root mean square error of a set of 60 problems. 
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Notice that this error in the final result does not explicitly consider intermediate 
errors or inconsistencies in the algorithm. For example, errors from solving linear 
equations, instability caused by propagation of rounding errors, or discretization 
errors do not enter explicitly. The final lumped error is seen with the eyes of the user. 


1.2 Risk of an Algorithm 

Computational methods involve parameters on which the accuracy depends. Dis- 
cretizations are characterized by their fineness 2 M. For example, a binomial method 
for option pricing may work with M = 100 or M = 50 time intervals. Let us 
call the first algorithm B-100, and the second B-50. Here “algorithm” is understood 
as an implementation where all accuracy parameters (as M) are fixed; B-100 is an 
algorithm different from B-50. 

Now we are prepared to define the computational risk for a given model: 
Computational Risk: 

The chosen algorithm does not deliver the required accuracy. 

For example, when an algorithm provides results with an error of 0.002 where we 
required 0.001 (three decimal digits), this would be strictly seen as failure. Nowadays 
in practice, it is widespread not to notice such a failure. As a “safety measure” one 
frequently chooses unnecessarily high values of the fineness M. This makes a failure 
less likely, but leads to overshooting and a lack of efficiency. As outlined above, 
we assume that the optimal algorithm is chosen such that it correctly matches the 
required accuracy. 


1.3 Eliminate the Risk 

Occasionally, it was suggested to establish algorithms with guaranteed accuracy [9] . 
Related algorithms are highly involved, expensive and hence used rarely. Although 
the idea of guaranteed accuracy is not really new, it seems appropriate to be pushed 
forward for applications in finance. For example, algorithms for option pricing have 
reached a level of sophistication which may allow to pursue as second step the 
establishing of dependable accuracy information. 

In this paper, we propose to unburden algorithms from relevant accuracy and error 
control. Rather the algorithms should be made as fast as possible, without iterating to 
convergence. As mentioned above, for each algorithm the mesh fineness M will be 
fixed. Then the algorithm has fixed costs, and can be regarded as “analytic method”. 
The implementation matters. External fine-tuning is not available, and the computer 
programs can be regarded as hard- wired. 


2 Number of subintervals into which an underlying interval is subdivided by a discretization 
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Table 1 Fictive entry in an 
accuracy file 


Correct digits 

Algorithm 

2 

A 

3 

B-50 

4 

B-100 

5 

C 


Then these “ultimate” versions of algorithms are investigated for their accuracy. 
We suggest to gather accuracy or error information into a file separate from the 
algorithm. This “file” can be a look-up table, or a set of inequalities for parameters. 
Typically, the accuracy results will be determined empirically. As an illustration, 
the accuracy information for a certain task (say, pricing an American- style vanilla 
put option) and a specific set of parameters (strike, volatility a, interest rate r, time 
to maturity T) might look as in Table 1. As application, one chooses the algorithm 
according to the information file. 


1.4 Effort 


Certainly, the above suggestion amounts to a big endeavor. In general, original papers 
do not contain the required accuracy information. Instead, usually, convergence 
behavior, stability, and intermediate errors are analyzed. Accuracy is mostly tested 
on a small selection of numerical examples. It will be a challenge to researchers, to 
provide the additional accuracy information for “any” set of parameters. The best way 
to organize this is left open. Strong results will establish inequalities for the parame- 
ters that guarantee certain accuracy. Weaker results will establish multidimensional 
tables of discrete values of the parameters, and the application will interpolate the 
accuracy. 

To encourage the work, let us repeat the advantages: Accuracy information and 
conditions under which algorithms fail will be included in external files. The algo- 
rithms will be slimmed down, the production runs will be faster, and the costs on a 
particular computer are fixed and known in advance. The computational risk will be 
eliminated. 


1.5 Example 

As an example, consider the pricing of a vanilla American put at the money, with one 
year to maturity. We choose an algorithm that implements the analytic interpolation 
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Fig. 2 Relative error, level curves: —0.004, —0.002, 0, +0.002 


method by Johnson [7]. 3 For the specific option problem, the remaining parame- 
ters are r and a. Figure 2 shows the relative error in the calculated price of the 
option depending on r and a, and implicitly a map of accuracies. For the underlying 
rectangle of realistic r, a -values, and the assumed type of option, a result can be 
summarized as follows: 

In case a > 3 r holds, the absolute of the relative accuracy is smaller than 0.005 
(two and a half digits). 

Of course, the accuracy result can be easily refined. 


2 Assessing Structural Risk 

We now turn to the second topic of the paper, on how to assess structural changes in a 
model computationally. This is based on dynamical systems , in which the dynamical 
behavior depends on a certain model parameter. Critical threshold values of this 
parameter will be decisive. Below we shall understand “structural risk” as given by 
the distance to the next threshold value of the critical parameter. An early paper 
stressing the role threshold values (bifurcations) can play for a risk analysis is [1 1]. 
The approach has been applied successfully in electrical engineering for assessing 
voltage collapse, see [3]. We begin with recalling some basic facts from dynamical 
systems. 


3 For analytic methods, strong results may be easier to obtain because implementation issues are 
less relevant. 
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2.1 Simplest Attractor 


The basic mean reversion equation is well-known in finance: This is a stochastic 
differential equation (SDE) for a stochastic process o t 

d cr t = - o>) d t + yaf d W t 

with constant a, £,y, S > 0, and W t denotes a Wiener process. This SDE is of 
the type 


d o t = f(cr t ) dt + driving force. 

The response of a t is attracted by the value of f , which becomes apparent by a 
simple stability analysis of the SDE’s deterministic kernel, the ordinary differential 
equation (ODE) x = f(x) = ct(£ — x). The state x = f is the simplest example of 
an attractor. 4 

For more flexibility, a constant (and unknown) value of f can be replaced by a 
suitable process ff, which in turn is driven by some model equation. This adds a 
second equation. A simple example of such a system is the tandem equation 


d o t = a\(£ t ~ at) dt + y°t &W t 

d£t = ~ St) dt 

An ODE stability analysis of its deterministic kernel does not reveal an attractor. 
Rather the equilibrium is degenerate, the Jacobian matrix is singular. Simulating the 
tandem system shows two trajectories dancing about each other, but drifting across 
the phase space erratically. What is needed is some anchoring, which can be provided 
by an additional nonlinear term. 


2.2 Mean-Field Models 

We digress for a moment to emphasize that the above tandem is a mean-field model. 
In canonical variables x \ , X 2 , it is of the type 

dx\ = [^(vi + X 2 ) — x{\ dt + y\x\ d 

dx2 = ot\ \^(x\ + X2) — X2~\ dt + Y 1 X 2 dWf 2) 

which generalizes to x \ , ...,x n . The reversion is to the mean 


4 The equilibrium x = £ is stable since df/dx = —a < 0; for t o o,x approaches 
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x := — 
n 

i 

and a key element for modeling interaction among agents [4,5]. More general mean- 
field models include an additional nonlinear term, and are of the type 

x = /3*f(x) + a* interaction + y * ext.forces . 

Notice that the dimension n is a parameter, and the solution structure thus depends 
on the number of variables. The parameters a measure the size of cooperation, and 
y the strength of external random forces. The nonlinearity f(x) and the balance of 
the parameters ft, a, y,n control the dynamics. 



2.3 Artificial Example 


As noted above, a suitable nonlinear term can induce a dynamic control that prevents 
the trajectories from drifting around erratically. Here we choose a cubic nonlinearity 
of the Duffing-type f(x) = x — v 3 , since it represents a classical bistability [13]. 
For slightly more flexibility, we shift the location of equilibria by a constant s; 
otherwise, we choose constants artificially. For the purpose of demonstration, our 
artificial example is the system 

dxi = 0. 1 (jci — s) jl — (x\ — sf] d t + 0.5 [X 2 — x\] d t + 0. Ijvi d 
dx 2 = 0.5 [x\ — X 2 ] d t 



Fig. 3 Artificial example of Sect. 2.3: x\ and X 2 over time t, for s = 2, starting at 0.1 
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Fig. 4 xi , X 2 -phase plane, with the trajectory of Fig. 3, and 1 1 trajectories of the unforced system 


Clearly, there are three ODE equilibria, namely two stable nodes at x\ = X 2 = s z b 1 
and a saddle at x\ = X 2 = s. For graphical illustrations of the response, see Figs. 3 
and 4. 

Figure 3 depicts the quick attraction of the trajectories (starting at 0.1) toward 
the smaller node at s — 1 = 1. This dynamical response is shown again in Fig. 4 in 
the x\, X 2 -phase portrait. As a background, this figure shows 11 trajectories of the 
deterministic kernel, where the random perturbation is switched off. Starting from 1 1 
initial points in the plane, the trajectories approach one of the two stable nodes. This 
part of the plane consists of two basins of attraction, separated by a separatrix that 
includes the saddle. The phase portrait of the deterministic kernel serves as skeleton 
of the dynamics possible for the randomly perturbed system. 

Now imagine to increase the strength of the random force (enlarge y). For suf- 
ficiently large y, the trajectories may jump across the wall of the separatrix. Then 
the dynamics is attracted by the other node. Obviously, these transitions between 
the two regimes may happen repeatedly. In this way, one of the stylized facts can 
be modeled, namely the volatility clustering [2] 5 . This experiment underlines the 
modeling power of such nonlinear systems. 


2.4 Structure in Phase Spaces 

The above gentle reminder on dynamical systems has exhibited the three items node , 
saddle , and separatrix. There are many more “beasts” in the phase space. The fol- 
lowing is an incomplete list: 


5 Phases with high and low volatility are separated from each other. 
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• stationary state, 

• periodic behavior, 

• chaotic behavior, 

• jumps, discontinuities, 

• loss or gain of stability. 

These qualitative labels stand for the structure of dynamical responses. The structure 
may change when a parameter is varied. Although a “parameter” is a constant, it may 
undergo slow variations, or may be manipulated by some external (political) force. 
Such changes in the “constant” parameter are called quasi- stationary. Typically, our 
parameter is in the role of a control parameter. Some variations in the parameter may 
have little consequences on the response of the system. But there are critical thresh- 
old values of the parameter, where the changes in the structure can have dramatic 
consequences. At these thresholds, small changes in the parameter can trigger essen- 
tial changes in the state of the system. The mathematical mechanism that explains 
such qualitative changes is bifurcation . 6 

When a system drifts toward a bifurcation, then this must be considered as risk! 

Bifurcation is at the heart of systemic risk. Hence there is a need for a tool that signals 
bifurcations in advance. 


2.5 Risk Index 


Let A denote a bifurcation parameter of a dynamical system. For an underlying model, 
we denote by Ao a numerically calculated critical threshold value of A. At this point, 
the model error enters, because Ao is based on the model. The distance to Ao is a 
measure of structural risk. This is the distance between the current operation point 
(A) and the closest bifurcation. To signal the distance, the risk index 


R( A) 


A 

|A — Ao | — £ 


was suggested [12]. (e is a small number representing several sources of error.) 
The larger the value of R , the closer the risk is. The index gives risk a quantitative 
meaning, invariant of the scaling of the model. 7 A feasible range of the parameter A 
has been defined by 


J^ C :={A| tf(A)<c}, 
and its complement is the risk area of level c. 


6 For an introduction into bifurcation and related numerical methods, see [13]. 

7 Essentially, this is a deterministic approach. One may think of incorporating a volatility into R. 
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2.6 Example 

Sometimes, stock prices behave cyclically, and one may ask whether there is an 
underlying deterministic kernel with periodic structure. In this context, behavioral 
trading models are of interest. Lux [ 8 ] in his model splits traders into chartists and 
fundamentalists, and models their impact on the price of an asset. The variables are 

• p{t) market price of an asset, with fundamental value p*\ 

• z proportion of chartists, and 

• x(t) their sentiment index, between —1 for pessimistic and +1 for optimistic. 

The growth p will be proportional to zv (impact of chartists) and to (1 — z)ip* — p) 
(impact of fundamentalists). Combining these two impacts leads to the first of the 
two equations in the system 


P = P (z x £c + (1 - z)(p* - p)H f) 

x — 2 zvi(tanh({/_| ) — x) cosh([/+_) + (1 — z)(l — x 2 )v 2 (sinh(t/ + f) — sinh({/_f)) 

The second equation models the sentiment x, with incentive functions U 4 , 

U + f , U- f 


| U is a smoothed version of | |, and the chosen constants are: 

P = 0.5, § c = 5, £ f = 5, vi = 0.5, v 2 = 0.75, 
a 1 = 1.02, 0 L 2 — 0.25, «3 = 1.5, r = 0.1, s — 0.8, p* = 10. 

This is an ODE system. The original model [ 8 ] includes a third equation for the 
proportion z- Our modified model is simpler in that it takes z as external parameter 
(our A,). The concern will be the structure of the response of the system as it varies 
with z. 

For the chosen constants, we calculate Ao = 0.6914 as critical threshold value of 
the parameter z [10]. This is a Hopf bifurcation, at which periodic cycles are born 
out of a stationary state. Accordingly, we have the two regimes 

• z < A.o: ip, x) = ip*, 0 ) stable stationary, and 

• z > A.o: stable periodic motion (cyclic behavior of the asset price). 

At the Hopf point, there is a transition between the regimes. The risk index R signals 
the critical threshold by large values (Fig. 5). For the chosen constants, the threshold 
occurs at a proportion of chartists of about 70 % of the traders. 



Risk and Computation 


315 



Fig. 5 Risk index R over parameter z. Left wing index along the stationary states. Right wing index 
along the periodic states 


2.7 Summary 

We summarize the second part of the paper. Provided a good model exists , 8 we 
suggest to begin with calculating the bifurcations/threshold values of parameters. 
They are the pivoting points of possible trend switching. The distance between the 
current operation point of the real financial system and the bifurcation point must 
be observed. Large values of the risk index can be used as indicator, signaling how 
close the risk is. This can be used as a tool for a stress test. 

Acknowledgments The paper has benefited from discussions with Roland C. Seydel. 
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Extreme Value Importance Sampling for Rare 
Event Risk Measurement 


D.L. McLeish and Zhongxian Men 


Abstract We suggest practical and simple methods for Monte Carlo estimation of 
the (small) probabilities of large losses using importance sampling. We argue that 
a simple optimal choice of importance sampling distribution is a member of the 
generalized extreme value distribution and, unlike the common alternatives such as 
Esscher transform, this family achieves bounded relative error in the tail. Examples 
of simulating rare event probabilities and conditional tail expectations are given and 
very large efficiency gains are achieved. 

Keywords Rare event simulation • Risk measurement • Relative error • Monte Carlo 
methods • Importance sampling 


1 Introduction 

Suppose Y = (Fi, Y 2 , . . . , Y m ) is a vector of independent random variables each 
with cumulative distribution function (cdf) F and probability density function (pdf) 
/ with respect to Lebesgue measure. Suppose we wish to estimate the probability of 
a large loss, p t = P(L( Y) > t) where L(Y) is the loss determined by the realization 
Y (usually assumed to be monotonic in its components) and t is some predeter- 
mined threshold. There are many different loss functions L(Y) used in rare event 
simulation, including barrier hitting probabilities of sums or averages of independent 
random variables, or of processes such as an Ornstein-Uhlenbeck or Feller process. 
The methods discussed here are designed for problems in which a small number 
of continuous factors are the primary contributors to large losses. We wish to use 
importance sampling (IS) (see [3], Sect. 4.6 or [12] p. 183): generate independent 
replications of Y repeatedly, say n times, from an alternative distribution, say one 
with pdf /is (y) and then estimate the above expected value using the IS estimator 
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Ee(U(L( Y) > t)f(Y ) //is (Y)]) , where I (*) denotes the indicator function. We use 

Pt = ~ +, I (L(Yj) > t ) , where Y ; ~/i S (y). (1) 

If we denote by E\s the expected value under the IS distribution /is , and by E the 
expected value under the original distribution /, then 

Eis (. Pt ) = E ls (/ (L(Y) > 

= Pt 

confirming that this is an unbiased estimator. There is a great deal of the literature 
on such problems when the event of interest is “rare”, i.e., when p t is very small, 
and many different approaches depending on the underlying loss function and dis- 
tribution. We do not attempt a review of the literature in the limited space available. 
Excellent reviews of the methods and applications are given in Chap. 6 of [1] and 
Chap. 10 of [9]. Highly efficient methods have been developed for tail estimation in 
very simple problems, such as when the loss function consists of a sum of indepen- 
dent identically distributed increments. In this paper, we will provide practical tools 
for simulation of such problems in many examples of common interest. For rare 
events, the variance or standard error is less suitable as a performance measure than 
a version scaled by the mean because in estimating very small probabilities such as 
0.0001, it is not the absolute size of the error that matters but its size relative to the 
true value. 


Definition 1 The relative error (RE) of the importance sample estimator is the ratio 
of the estimator’s standard deviation to its mean. 

Simulation is made more difficult for rare events because crude Monte Carlo 
fails. As a simple illustration, suppose we wish to estimate a very small probability 
p t . To this end, we generate n values of L ( Y z ) and estimate this probability with 
p = X/n where X is the number of times that L (Y z - ) > t and X has a Binomial (n , p t ) 
distribution. In this case, the relative error is 


Aar(f) 

' *(f) 


0--Pt) 

Pt 



For rare events, p t is small and the relative error is very large. If we wish a normal- 
based confidence interval for p t of the form ( p t —0.1 p t , p t +0.1 p t ) for example, 
we are essentially stipulating a certain relative error (RE = 0.05102) whatever the 
value of p t . In order to achieve a reasonable bound on the relative error, we would 
need to use sample sizes that were of the order of p/ 1 , i.e., larger and larger sample 
sizes for rarer events. 
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Returning to the estimator (1), if we take its variance, we obtain 


-l 


£is^/(L(Y)>f) 
the relative error is 


/ 2 (Y) 


)-A 2 = n ~ l [ E ( 


/(L(Y) > t) 


my 

/is( Y), 


p t 


RE(/i S ; t,n)—n 


= 1/2 


/(L(Y)>0 


/(Y) ' 
/is(Y), 


-i 1/2 


- 1 


The relative error decreases linearly in n l/<2 but it is the factor 


Pt 


-2 


[ £ ( 


KL(Y) > t) 


/(Y) 

/is(Y) 


)f- 


highly sensitive to £ when p t is small, that determines whether an IS distribution 
is good or bad for a given problem. There is a large literature regarding the use 
of importance sampling for such problems, much recommending the use of the 
exponential tilt or Esscher transform. The suggestion is to adopt an IS distribution 
of the form 

As O’) = constant x e 9y f(y ) (2) 


and then tune the parameter 0 so that the IS estimator is as efficient as possible (see, for 
example [1, 7, 15]). Chapter 10 of [9] provides a detailed discussion of methods and 
applications as well as a discussion of the boundedness of relative error. McLeish [11] 
demonstrates that the IS distribution (2) is suboptimal and unlike the alternatives we 
explore there, does not typically achieve bounded relative error. We argue for the use 
of the generalized extreme value (GEV) family of distributions for such problems. 
A loose paraphrase of the theme of the current paper is “all you really need 1 is 
GEV”. Indeed in Appendix A, we prove a result (Proposition 1) which shows that, 
under some conditions, there is always an importance sampling estimator whose 
relative error is bounded in the tail obtained by generating the distance along one 
principal axis from an extreme value distribution, while leaving the other coordinates 
unchanged in distribution. We now consider some one-dimensional problems. 


2 The One-Dimensional Case 

Consider estimating P ( L(Y ) > t) where the value of t is large, the random variable 
Y is one-dimensional and L(y) is monotonically increasing. We would like to use an 
importance sample distribution for which, by adjusting the values of the parameters, 


1 For importance sampling estimates of rare events, at least, with apologies to the Beatles. 
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we can have small relative error for any large t. We seek a parametric family {/# ; 0 e 
&} of importance sample estimators which have bounded relative error as follows: 

Definition 2 Suppose is the class of non-negative integrable functions { I (L (Y) > 

0; for t > T}. We say a parametric family {fo; 6 e G }has bounded relative error 
for the class if 

sup inf RE(/6i; t, n) < oo. 

t>T 0e& 

A parametric family has bounded relative error for estimating functions in a class 
Jrf? if, for each t > T, there exists a parameter value 9 which provides bounded 
relative error. Indeed, a bound on the relative error of approximately 0.738ft -1 / 2 can 
be achieved by importance sampling if we know the tail behavior of the distribu- 
tion. There are very few circumstances under which the exponential tilt, families of 
continuous densities of the form 

fo(x) = constant x e 6y /(y), 

provides bounded relative error. The literature recommending the exponential tilt 
usually rests on demonstrating logarithmic efficiency (see [1], p. 159 or Sect. 10.1 of 
[9]), a substantially weaker condition that does not guarantee a bound in the relative 
error. Although we may design a simulation optimally for a specific criterion such as 
achieving small relative error in the estimation of P(L(Y) > t), we are more often 
interested in the nature of the whole tail beyond t. For example, we may be interested 
in E [(L(7) — t) I (L(Y) > t )] = J t °° P(L(Y) > s)ds and this would require that 
a single simulation be efficient for estimating all parameters P(L(Y) > s),s > t. 
The property of bounded relative error provides some assurance that the family used 
adapts to the whole tail, rather than a single quantile. 

For simplicity, we assume for the present that Y is univariate, has a continuous 
distribution, and L(Y) is a strictly increasing function of Y. Then 


P (L(Y) > t) = f f(y)dy 
L~Ht ) 

and we can achieve bounded relative error if we use an importance sample distribution 
drawn from the family 

fe (y) = constant x e eT ^f(y) (3) 

where T (y) behaves, for large values of y, roughly like a linear function of F(y) = 
1 — F(y). If T(y) ~ — T’(y)asy oo, the optimal parameter# is 0 t = ^ — L ^ 9 f 36 2 

and the limit of the relative error of the IS estimator is ~ 0.738ft -1 / 2 (see Appendix 


2 Here k 2 — 1.5936 is the unique positive solution to the equation e k + | = 1. 
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A). The simplest and most tractable family of distributions with appropriate tail 
behavior is the GEV distribution associated with the density fiy). 

We now provide an intuitive argument in favor of the use of the GEV family of 
IS distributions. For a more rigorous justification, see Appendix A. 

The choice T(y) = — F(y) provides asymptotic bounded relative error [11]. 
Consider a family of cumulative distribution functions 

e ~0F{y) _ e -e 
Fo 00 = — j Zfl— ■ 

1 — e u 


The corresponding probability density function F' e (y) is of the form (3). As 0 — > oo 
and y -> oo in such a way that 0F(y) converges to a nonzero constant, then 

(F(y)) 6 = (1 - F(y)) e ~ e~ 6p ^\ (4) 

so that 

F e (y) ~ (F(j)) e . (5) 

Therefore, Fq (y) is asymptotically equivalent to the distribution of the maximum 
of 6 observations from the original target distribution F. This, suitably normalized, 
converges to a member of the GEV family of distributions. We also show in [1 1] that 
the optimal parameter is asymptotic to 0 = k^l Pt as p t — >► 0. Consequently, 

F e (t) ~ (1 - F{t)) 6 ~ (1 - p,) k2/p ' ~ e~ k2 ~ 0.203. 

Thus, when we use the corresponding extreme value importance sample distribution, 
about 20.3 % of the observations will fall below t and the other 79.7 % will fall above, 
and this can be used to identify one of the parameters of the IS distribution. Of course, 
only the observations greater than t are directly relevant to estimating quantities like 
P(L > t). This leads to the option of conditioning on the event L > t and using the 
generalized Pareto family (see Appendix B). 

The three distinct classes of extreme value distributions and some of their basic 
properties are outlined in Appendix B. All have simple closed forms for their pdf, cdf, 
and inverse cdf and can be easily and efficiently generated. In addition to a shape 
parameter §, they have location and scale parameters d, c so the cdf is 
where H%(x) is the cdf for unit scale and 0 location parameter. We say that a given 
continuous cdf F falls in the maximum domain of attraction of an extreme value cdf 
H% (v) if there exist sequences c n , d n such that 

F n (d n + c n x) -> H%(x) as n — >► oo. 

We will choose an extreme value distribution with parameters that approximate the 
distribution of the maximum of 0 t = k^/pt random variables from the original 
density fiy). Further properties of the extreme value distributions and detail on the 
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choice of parameters is given in Appendix B. Proposition 1 in Appendix A shows 
that if F is in the domain of attraction of Ho, then Ho provides a family of IS 
distributions with bounded relative error. It is also unique in the sense that any other 
such IS distribution has tails essentially equivalent to those of Ho. A similar result 
can be proved for § 7^ 0. The superiority of the extreme value distributions for 
importance sampling stems from the bound on relative error, but equally from their 
ease of simulation, the simple closed form for the pdf and cdf and the maximum 
stability property, which says that the distribution of the maximum of i.i.d. random 
variables drawn from this distribution is a member of the same family. 

We get a better sense of the extent of the variance reduction using IS if we compare 
sample sizes required to achieve a certain relative error. If we use crude random 
sampling in order to estimate p t using a sample size n cr , the relative error of the 
crude estimator is 


/ Pt(l-Pt) r 

RE (crude) = — — = — 

Pt V n cr Pt 

whereas if we use a GEV importance sample of size nis, the relative error is 
RE (IS) — 0.738 n ls ' . Equating these, the ratio of the sample sizes required for 
a fixed relative error is ^ ~ for p t small. Indeed, if p t = 10 -4 , an importance 

sample estimator based on a sample size 5 x 10 6 is quite feasible on a small laptop 
computer, but is roughly equivalent to a crude Monte Carlo estimator of sample size 
9.2 x 10 10 , possible only on the largest computers. 


3 Examples 

3.1 Example 1: Simulation Estimators of Quantiles and 
TailVar for the Normal Distribution 


Rarely when we wish to simulate an expected value in the region of the space 
IL(Y) > t] is this the only quantity of interest. More commonly, we are interested 
in various functions sensitive to the tail of the distribution. This argues for using 
an IS estimator with bounded relative error rather than the more common practice 
of simply conditioning on the region of interest. For a simple example, suppose Y 
follows a A(0, 1) distribution and we wish to estimate a property of the tail defined 
by Y > t, where t is large. Suppose we simulate from the conditional distribution 
given Y > t, that is from the pdf 


1 

1 - 0(0 


<P(y)i (y > t) 


( 6 ) 
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where 0 and 0 are the standard normal pdf and cdf respectively. If we wish also 

to estimate P(Y > t + s\Y > t) ~ e st 2 for s > 0 fixed, sampling from this 
pdf is highly inefficient, since for n simulations from pdf (6), the RE for estimating 

P(Y > t + s\Y > t) is approximately n~ l ^ 2 y/ e st+s ~^ — 1 and this grows extremely 

s 2 

rapidly in both t and s. We would need a sample size of around n = 10 4 e st+ T (or 
about 60 trillion if s = 3 and t = 6) from the IS density (6) to achieve a RE of 1 %. 

Crude Monte Carlo fails here but use of IS with the usual standard exponential tilt 
or Esscher transform with T(y) = y, though very much better, still fails to deliver 
bounded relative error. In [1 1], it is shown that the relative error is ) 1/4 y/t/n —> 
00 as p t -> 0. While the IS distribution obtained by the exponential tilt is a very large 
improvement over crude Monte Carlo and logarithmically efficient, it still results in 
an unbounded relative error as p t -> 0. 

The Normal distribution is in the maximum domain of attraction of the Gumbel 
distribution (§ = 0) so our arguments suggest that we should use as IS distribution 

Ho( y ~ Z ^ L ) = expt-e^-^) (7) 

c 

with parameters c, d selected to match the distribution function { 0 {y)) kl ^ Pt (see 
Appendix A). Using this Gumbel distribution as an IS distribution permits a very 
substantial increase in efficiency. 

To show how effective this is as an IS distribution, we simulate from the Gumbel 
distribution with cdf (7). The weights attached to a given IS simulated value of Y are 
the ratio of the two pdfs, the standard normal and the Gumbel, or 

_y-d^ Y-d e 

w(Y) = C 0 (j)(Y)ex p(e c o H ). (8) 

co 

For example with t = 4.7534, p t = 10 -6 and Gumbel parameters c = 0.20 and 
d = 4.85, the relative error in 10 6 simulations was 0.729ft -1 / 2 . We can compare 
this with the exponential tilt, equivalent to using the normal (t, 1) distribution as an 
IS distribution, whose relative error is 2.32ft -1 / 2 , or with crude Monte Carlo, with 
relative error around 10 3 ft -1 / 2 . 

Suppose that our interest is in estimating the conditional tail expectation or TVaR a 
based on simulations. The TVaR a is defined as E(Y\Y > t) = 
designed the GEV parameters for simulating the numerator, E[Y I (Y > r)]. If we 
are interested in estimating TVaRo.oooi by simulation, t = VaRo.oooi = 3.719 the 
true value is 

f°° 7e~ zl / 2 d7 

TVaRo.oooi = j ff 9 _ z2/2 - 3.9585. 

J3.719 e dz 

We will generate random variables Y/ using the Gumbel (0.282, 3.228 ) distribution 
and then attach weights (8) to these observations. The estimate of TVaR a is then the 
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Table 1 Relative error of 
estimators of p t and of 
E[YI(Y > t )] 



ft 1 / 2 x RE p t 

n 1/2 x RE E[YI(Y > t)] 

1. Crude 

too 

100 

2. SN (tilt) 

2.05 

2.05 

3. EV IS 

0.73 

0.73 

4. Cond EV IS 

0.47 

0.47 


average of the values w(7;) x 7; averaged only over those values that are greater 
than t. 

The Gumbel distribution is supported on the whole real line, while the region of 
interest is only that portion of the space greater than t so one might generate 7/ from 
the Gumbel distribution conditional on the event 7/ > t rather than unconditionally. 
The probability P(Yi > t ) where 7/ is distributed according to the Gumbel(c^ , do) 
distribution is exp(— e~^~ de ^ ce ) and this is typically around 0.80 indicating that 
about 20 % of the time the Gumbel random variables fall in the “irrelevant” portion 
of the sample space S < t. Since it is easy to generate from the conditional Gumbel 
distribution 7 1 7 > t this was also done for a further improvement in efficiency. This 
conditional distribution converges to the generalized Pareto family of distributions 
(see Theorem 2 of Appendix B). In this case, since § = 0, P(Y — u < z\Y > 
u) — >► 1 — e~ z as u -* oo. Therefore, in order to approximately generate from the 
conditional distribution of the tail of the Gumbel, we generate the excess from an 
exponential distribution. 

Table 1 provides a summary of the results of these simulations. Several simulation 
methods for estimating TVaR a = E(Y\Y > t) = E ^ YI ^ >t ^ with p t = P(Y > t) as 
well as estimates of p t are compared. Since TVaR is a ratio, we consider estimates of 
the denominator and numerator, i.e., p t and E[Y I (7 >0] separately. The underlying 
distribution of 7 is normal in all cases. The methods investigated are: 

1 . Crude Simulation (Crude) Generate independently 7/, i = l, ... ,n from orig- 
inal (normal) distribution. Estimate p t using d X/Li d(7* > 0 and estimate 
E [ YI(Y > 0] using I ZL, YiI(Yi > t). 

2. Exponential Tilt or Shifted Normal IS (SN) Generate independently 7/, i = 
1 , ... ,n from N (t, 1) distribution. Estimate p t using ^ ^a=\ WiI(Yi >0 an d 
estimate E [7 1 (7 > t) ] using d ^" =1 w/7/7(7/ >0 where w; are the weights, 
obtained as the likelihood ratio 


Wi = w(Yi) 


<KYi) 

HYi-t Y 


Since the exponential tilt, applied to a Normal distribution, results in another Nor- 
mal distribution with a shifted mean and the same variance, this is an application 
of the exponential tilt. 

3. Extreme Value IS (EVIS) Generate independently 7/, i = l, ... ,n from the 
Gumbel(c, d) distribution. Estimate p t using d ^" =1 w/7(7/ > t) and estimate 
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E[YI(Y >/)] using i2=i 
as the likelihood ratio 


Wi 


WiYiKYi > t) where w; are the weights, obtained 


= w(Yi) = 


<KYi) 

^O(^) 


where is the corresponding Gumbel pdf. 

4. Conditional Extreme Value IS (Cond EVIS) Generate independently Y[ , i = 
1 , ,n from the Gumbel (c, d) distribution conditioned on Y > t. Estimate 
p, using \ YH= l WiI(Yi > t) = \ X'Li w i and estimate E[YI(Y > t)] using 
\ E”= i Y 1 1 ( Y I > 0 = \ YJi = i w i Y i where w, are the weights, obtained as 
the likelihood ratio 


Wi = w(Yi) 


<HYi) 

g(Yi ) 


and where 


gO) = 




for ^ > t 


is the corresponding conditional Gumbel pdf. 

Conditional Normal IS Since we are interested in the tail behavior of the ran- 
dom variable Y given Y > t it would be natural to simulate from the conditional 
distribution Y\Y > t. Unfortunately, this is an infeasible method because it requires 
advance knowledge of p s = P(Y > s) for all s > t. 

We indicate in Table 1 the relative error of these various methods in the case 
p t = 0.0001, t = 3.719. The corresponding parameters of the Gumbel distribution 
that we used were c = 0.243, d = 3.84 but the results are quite robust to the 
values of these parameters. Notice that the efficiency gain of the conditional extreme 
value simulation, as measured by the ratio of variances, is around (<7^7) — 45, 270 

relative to a crude Monte Carlo and around ^§^7^ — 19 relative to the exponential 
tilt. 


3.2 Example 2: Simulating a Portfolio Credit Risk Model 


We provide a simulation of a credit risk model using importance sampling. The 
model, once the industry standard, is the normal copula model for portfolio credit 
risk, introduced in Morgan’s CreditMetrics system 3 (see [5]). Under this model, 
the k f th firm defaults with probability Pk(Z ), and this probability depends on m 
unobserved factors that comprise the vector Z. Losses on a portfolio then take the 
form L = X&=i c kYk where Yk, the default indicator, is a Bernoulli random variable 
with P(Yk = 1) = Pk(Z), (denoted Yk ~ Bern(p^(Z))), pk are functions of 


3 Very popular prior to 2008 ! 
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common factors Z, and Ck is the portfolio exposure to the k'th default. Suppose we 
wish to estimate P{L > t). 


3.2.1 One-Factor Case 


In the simplest one-factor case, pk are functions pk(Z) = 0 


( a k Z + &-\ Pk ) \ 

f 1 * ) 


of a 


common standard normal Af(0, 1) random variable Z, the scalars are the factor 
loadings or weights and pk represents the marginal probability of default (it is easy 
to see that E[pk(Z)] = pk). 

If we wish to simulate an event P(L > t) which has small probability, there are 
two parallel opportunities for importance sampling, both investigated by [4] . 

For example, for each given Z, we might replace the Bernoulli distribution of Yk 
with a distribution having higher probabilities of default, i.e., replace pk(Z) by qk 
where qk > Pk(Z). The choice of qk is motivated by an exponential tilt as is argued 
in [4]. Conditional on the factor Z, the tilted Bernoulli random variables Yk are such 
that E l c k¥k I Z) = L. We do not require the use of importance sampling in this 
second stage of the simulation so having used an IS distribution for Z, we generate 
Bernoulli (pi (Z)) random variables F; . There are two similar alternatives in the first 
stage, generate Z from the Gumbel or generate L = X/c=i c k Pk(Z ), a proxy for the 
loss, from the Gumbel distribution. These two alternatives give similar results since 
the Gumbel is the extreme value distribution corresponding to both Z and L, and L 
is a nondecreasing function of Z. In Table 2, we give the results corresponding to 
the second of these alternatives, simulating L = X/c=i c kPk(Z ) and then solving 
for the factor Z. Unlike [4], where a shifted normal IS distribution for Z is used, we 
use the Gumbel distribution for L motivated by the arguments of Sect. 2. Extreme 
value importance sampling provides a very substantial variance reduction over crude 
simulation of course, but also over importance sampling using the exponential tilt. 
We determine appropriate parameters for the Gumbel extreme value distribution 
by quantile matching and then draw Z from a Gumbel(c, d) distribution. We use 
the parameters taken from the numerical example in [4], i.e., v = 1,000 obligors, 
exposures Ck are 1, 4, 9, 16, and 25 with 200 at each level of exposure, and the 
marginal default probabilities pk = 0.01 (1 + sin 16^) so that they range from 
0 to 2%. The factor loadings ak were generated as uniform random variables on 
the interval (0,1). In summary, the main difference with [4] is our use of the Gum- 
bel distribution for simulating L rather than the shifted normal and the lack of a 
tilt for Y k . 

The resulting relative errors estimated from 30,000 simulations are shown in 
Table 2, and evidently there is a significant variance reduction achieved by the choice 
of the Gumbel distribution. For example, when the threshold t was chosen to be 2,000, 
there was a decrease in the variance by a factor of approximately ( ) or about 1 1 
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Table 2 Relative error of estimators: Crude, shifted normal (G & L) and EVIS 


t 

Pt 

n 

n l ' 2 RE(Crude) 

n l/2 RE(EVIS) 

n 1 /2 RE(G&L) 

1,500 

0.0075 

30,000 

12.01 

0.70 

2.03 

2,000 

0.0041 

30,000 

15.34 

0.69 

2.33 

3,000 

0.0022 

30,000 

22.12 

0.70 

2.74 


and a much more substantial decrease over crude by a factor of around 
about 494. 



or 


3.2.2 Multifactor Case 


In the multifactor case, the event that an obligor k fails is determined by a Bernoulli 
random variable Yk ~ Bern (pk). The loss function L = XaUi c kYk is then a linear 
function of and corresponding exposures We wish to estimate the probability 
of a large loss: P(L > t). The values pk are functions 


Pk( Z) = <P 


a^Z + 0 l (pk) 




a*a[ 


of a number of factors Z T = (Zi, . . . , Z m ) where the individual factors Z/, i = 
1, . . . , m are independent standard normal random variables. Here pk is the marginal 
probability that obligor k fails, i.e., P(Yk = 1) = E[pk( Z)] = since a^Z is 
N(0, SLk^l ) (see [4], p. 1644) and the row vectors are factor loadings which relate 
the factors to the specific obligors. 


Simulation Model We begin with brief description of the model simulated in 
[4, p. 1650], that is the basis of our comparison. We assume v = 1,000 obligors, 
the marginal probabilities of default pk = 0.01(1 + sin (16 tt^)), and the exposures 
c k = [^] 2 , k = 1, . . . , v. The components of the factor loading vector a& were gen- 
erated as independent U(0, -^U), where m is the number of factors. The simulation 
described in [4] is a two-stage Monte Carlo IS method. The first stage simulates the 
latent factors in Z by IS, where the importance distributions are independent univari- 
ate Normal distributions with means obtained by solving equating modes and with 
variances unchanged equal to 1 . Specifically they choose the normal IS distribution 
having the same mode as 


— z r z/2 



~ P(z) \\ 
cr( z) // 


e -z T z/2 


P(L > t |Z = z)e 


( 9 ) 
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because this is approximately proportional to P(Z = z\L > t), the ideal IS distrib- 
ution. In other words, the IS distribution for Z[ is , 1), i = 1, ... ,m where the 
vector of values of /x* is given by (see [4], Eq. (20)) 

/x — max P(L > t |Z = z)e~ 7T7 ^ 2 (10) 

z 


with (see [4], p. 1648) 


(t -E[L|Z = z]\ 

P(L > ?|Z = Z) " 1 - ^ (vVartL^z]) ^ E[L|Z = Z] > " (1 1} 

Conditional on the values of the latent factors Z, the second stage of the algorithm 
in [4] is to twist the Bernoulli random variables Y \ using modified Bernoulli distrib- 
utions, i.e., with a suitable change in the values of the probabilities P(Yk = 1), & = 
1, . . . , v. Our comparisons below are with this two-stage form of the IS algorithm. 

Our simulation for this portfolio credit risk problem is a one- stage IS simulation 
algorithm. If there are m factors in the portfolio credit risk model we simulate m — 1 
of them Z[ from univariate normal N{fii , 1), i = 1 , . . . , m — 1 with a different mean, 
as in [4], but then we simulate an approximation to the total loss, L, from a Gumbel 
distribution, and finally set Z m equal to the value implied by Zi, . . . , Z m _i and L. 
This requires solving an equation 

L(Zi, . . . , Z m _ i, Z m ) = L (12) 


for Z m . The parameters /x = (/xi, /X 2 , . . . , /x m _i) are obtained from the crude sim- 
ulation. Having solved (12), we attach weight to this IS point (Z i , . . . , Z m ) equal to 


co = 


dL 

dZ m 


cxnr=i<Hz«) 

• x n?Ji l 


e (L d)/t: ex p 


(e - (L - d ^) , (13) 


and 


dL 

dZ m 


z 


^ k,m 


a^Z + 0 [ (pk) 


*= i y i - v 7 1 


We choose the parameters /x 7 - , i = 1 , . . . , m — 1 for the above IS distributions using 
estimates of the quantity 


/x = E(Z\L > t) = E 



CkPkW > t 


) 


(14) 
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based on the preliminary simulation, with the parameters c,d of the Gumbel obtained 
from (24). 

We summarize our algorithm for the portfolio credit risk problem as follows: 

1. Conduct a crude MC simulation and estimate the parameter [i in (14). 

2. Estimate parameters c and d of the Gumbel distribution (24) where E{L) is 
estimated by average (L|L > t). 

3. Repeat (a)-(d) for independent simulations as j = 1 , . . . , n where n is the sample 
size of the simulation. 

(a) Generate L from the Gumbel(c, d) distribution. 

(b) Generate Z* , i = 1, . . . , m — 1 from the univariate normal 1) distri- 

butions. 

(c) Solve L(Z i, . . . , Z m _ i, Z m ) = t for Z m and calculate (13). 

(d) Simulate a loss Lj = XJc=i c kYk where Y& ~ Bern(p^(Z)) with p&(Z) = 


5. Estimate the variance of this estimator using n 1 times the sample variance of 
the values oojI(Lj > t), j = 1, . . . , n. 

Simulation Results The results in Table 3 were obtained by using crude Monte Carlo, 
importance sampling using the GEV distribution as the IS distribution, and the IS 
approach proposed in [4]. In the crude simulations, the sample size is 50,000, while 
in the later two methods, the sample size is 10,000. 

Notice that for a modest number of factors there is a very large reduction in 
variance over the crude (for example the ratio of relative error corresponding to 
2 factors, t = 2,500 corresponds to an efficiency gain or variance ratio of nearly 
2,400) and a significant improvement over the Glasserman and Li [4] simulation 
with a variance ratio of approximately 4. This improvement erodes as the number of 
factors increases, and in fact the method of Glasserman and Li has smaller variance in 
this case when m = 10. In general, ratios of multivariate densities of large dimension 
tend to be quite “noisy”; although the weights have expected value 1 , they often have 
large variance. A subsequent paper will deal with the large dimensional case. 



)■ 


4. Estimate p t using a weighted average 
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Table 3 Comparison between crude simulation, EVIS and Glasserman and Li (2005) for the credit 
risk model 


t 

Pt 

n 

n 1 / 2 RE (crude) 

n 1 / 2 RE (EVIS) 

n 1 / 2 RE(G&L) 

2 factors 

1,500 

0.0034 

50,000 

17.1 

0.99 

1.73 

2,000 

0.0015 

10,000 

26.2 

0.96 

1.82 

2,500 

0.00038 

10,000 

51.3 

1.05 

1.93 

3 factors 

1,500 

0.00305 

50,000 

18.94 

1.24 

1.72 

2,000 

0.00111 

10,000 

31.61 

1.15 

1.82 

2,500 

0.00042 

10,000 

49.99 

1.35 

1.99 


5 factors 


1,500 

0.00289 

50,000 

18.87 

1.39 

1.71 

2,000 

0.00099 

10,000 

39.52 

1.55 

1.81 

2,500 

0.00035 

10,000 

55.89 

1.57 

1.88 


10 factors 


1,500 

0.00246 

50,000 

20.83 

1.84 

1.79 

2,000 

0.00081 

10,000 

33.70 

2.15 

1.89 

2,500 

0.00029 

10,000 

57.73 

3.06 

1.98 


4 Conclusion 

The family of extreme value distributions are ideally suited to rare event simulation. 
They provide a very tractable family of distributions and have tails which provide 
bounded relative error regardless of how rare the event is. Examples of simulating 
values of risk measures demonstrate a very substantial improvement over crude 
Monte Carlo and a smaller improvement over competitors such as the exponential 
tilt. This advantage is considerable for relatively low-dimensional problems, but there 
may be little or no advantage over an exponential tilt when the dimensionality of the 
problem increases. 

Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 


Appendix A: Assumptions and Results 


We suppose without loss of generality that the argument to the loss function is a 
multivariate normal MNV(0, I m ) random vector Z, since any (possibly dependent) 
random vector Y can be generated from such a Z. We begin by assuming that “large” 
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values of L( Z) are determined by the distance of Z = (Z\, Z 2 , . . . , Z m ) from the 
origin in a specific direction, i.e., 


Assumption 1 There exists a direction vector v e m such that, for all fixed 
vectors w e 


P(L(Zp\) > t ) 
P(L(Zo\ + w) > t) 


(15) 


where Zo is N(0, 1). 

We propose an importance sampling distribution generated as follows: 

Z = Tv + (I m — W)s, where s ~ MVN(0, I m ), (16) 


where Y has the extreme value distribution Ho(^fr-). If we replace the distribution 
of Y by the standard normal, it is easy to see that (16) gives Z ~ MVN(0, I m ) so the 
IS weight function in this case is simply the ratio of the two univariate distributions 
for Y. 

Assumption 2 Suppose that for any fixed w e 0t m , there exits yo such that 
L(yv + w) is an increasing function of y for y > yo. 

Proposition 1 Under assumptions 1 and 2, there is a sequence of importance sam- 
pling distributions of the form (16) which provides bounded relative error asymptotic 
to cn~ l t 2 as p t — > 0 where c — 0.738. 

In order to prove this result, we will use the following lemma, a special case of 
Corollary 1 of [11]: 

Lemma 1 Suppose the random variable Y has a continuous distribution with cdf 
Fy. Suppose that T(y) is nondecreasing and for some real number a we have 
a + T(y) ~ —Fy(y) as y —> yf with yp = sup{y; Fy{y) < 1} < 00 . Then the IS 
estimator for sample size n obtained from density (3) with 0 = 0 t = ^ has bounded 

RE asymptotic to cn~ as p t — >► 0 where c = 2 — 1—^2 — 0-738. 4 

Proof of Proposition 1. The condition (15) allows us to solve an asymptotically 
equivalent univariate problem, i.e., estimate P(L\(Y) > t) where L\(Y) = L(Y\), 
Y ~ N( 0, 1). Clearly, the Normal distribution for Y satisfies Fy e MDA(//o(v)) so 
that there exist sequences c n , d n such that Fy(d n + c n x) -> Hq(x) as n -> 00 for 
the GEV Ho. Lemma 1 shows that the importance sampling distribution 

fo (y) = constant x e~pt Fy ^f(y) (17) 


4 k 2 — 1.5936 and c — 0.738 are the unique positive solutions to the equations e k 
l+k 2 (l+c 2 ). 
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provides bounded relative error for the estimation of p t as t -> oo and p t -> 0. Note 
that the probability density function of the maximum n = ^ + 1 random variables 
drawn from the distribution Fy (>’) is given by 

(Fy(y))" -1 /OO = constant x (1 — Fyiy))™ fiy) ~ constant xe~pi Fy ^f(y). 

( 18 ) 

Furthermore, by the local limit or density version of convergence to the extreme 
value distributions, (see Theorem 2 (b) [2] or [14]), with y = d n + c n x, and x = 
(y ~ d n )/c n , 

nFy~ l (d n + c n x)f(d n + c n x) -> Hq(x) = — exp (— v — as ft — > oo 

c n 

which implies, combining (17) and (18), that 

nc n (F Y (y)) n fiy ) ~ exp (-(j - d n )/c n - e -^~ d ^ c ^ . (19) 

Therefore, the extreme value distribution provides a bounded relative error impor- 
tance sampling distribution, equivalent to (17). 


Appendix B: Maximum Domain of Attraction and 
Properties of The Generalized Extreme Value Distributions 

Maximum domain of attraction 

If there are sequences of real constants c n and d n , n = 1,2,... where c n > 0 for all 
n, such that 

F n (d n + c n x) -> H{x) as n -> oo, (20) 

for some nondegenerate cdf H(x), then we say that F is in the maximum domain 
of attraction (MDA) of the cdf H and write F e MDA (//). The Fisher-Tippett 
theorem (see Theorem 7.3 of [13]) characterizes the possible limiting distributions 
H as members of the generalized extreme value distribution (GEV). A cdf is a 
member of this family if it has cumulative distribution function of the form 
where c > 0 and 

Ho(x) = exp(— e~ x ), H^{x) = 1/? for § / 0 and^v > —1. (21) 

Theorem 1 (Fisher-Tippet, Gnedenko) If F e MDA(H) for some nondegenerate 
cdf H, then H must take the form (21). 

The properties of the GEV distributions listed in Table 4 are obtained from routine 
calculations and properties in [13] or [10]. 
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Table 4 Some properties of the generalized extreme value distributions 


Property 

% = 0 

§ 7 ^ 0 and > — 1 

cdf = H^{x) 

exp(— e~ x ) 

exp(— (1 + fx) _1/ ^) 

pdf = h%(x) 

e~ x exp(— e~ x ) 

(1 + Zx)~r exp(— (1 + fx)” 1 /?) 

Mode: satisfies 
H$(x ) =exp(— 1 - f) 

0 

+ 

1 

H$(x\x > X\/ 2 ) 

P( e _l ) ,forx>xi/ 2 

exp(— (1 + r 

1 — exp(— 1 — £) 5 tor X > X\/2 

Median 

x \/2 = — In (In 2) 

(In 2)-? - 1 
£ 

Inverse H^ l (p) 

- ln(— In p ) 

(- In p) * - 1 

S 

Mean 

y = (Euler’s constant) ~ 0.577216 

iff < i 

oo if § > 1 

Variance 

^ ~ 1.645 

ra- 2 t)- 2 r(i-f) + i if ^ . l 
oo if § > \ 

Random number gen- 
erator U ~ U( 0, 1) 

- ln(— In U) 

(— In U)S -1 

£ 


Choosing the parameters c and d 

The GEV has H % and probability density function c ~ 1 (^^). Other parame- 

ters can be easily found in the above table. We wish to choose an extreme value dis- 
tribution with parameters corresponding to the maximum of a sample of 6 t = k^/ Pt 
random variables from the original density f(y). In other words, we wish to find 
values of dg t and c$ t so that 


(F(y)f ( 22 ) 

c e, 

and this leads to matching t with the quantile corresponding to e~ kl ~ 0.203. In 
other words, one parameter is determined by the equation 


= Hp(e- k2 ) = 
ce, ? 


- In (k 2 ) 
— 1 


§ = 0 


(23) 


Another parameter can be determined using the crude simulation and the values of 
Y for which L(Y ) > t. We can match another quantile, for example, the median, the 
mode, or the sample mean which estimates E[Y \ L(Y) > t]. In the case of standard 
normally distributed inputs and the Gumbel distribution, matching the conditional 
expected value E(L\L > t) and (23) results approximately in: 


E{L) — t 


c = 


1.0438 ’ 


and d = t + 0.46659c. 


(24) 
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Her zE(L) = average(L|L > t) based on a preliminary crude simulation of values of 
L simulated under the original distribution. Of course, one could also use maximum 
likelihood estimation to determine appropriate parameters for the ID distribution (see 
[10]) but the specific choice of estimator seemed to have little impact on the quality 
of the importance sampling provided that the estimated GEV density was sufficiently 
dispersed. 

As an alternative to simulating L from the GEV, we may simulate instead from 
L\L > t, resulting in the generalized Pareto distribution. For a given c.d.f. F, the 

conditional excess distribution is 


F u (y) = P(X-u>y\X>u) = 


F(u + y) — F(u) 
1 - F(u) 


x > 0. 


Then the conditional excess distribution can be approximated by the so-called gen- 
eralized Pareto distribution for large values of u (see [13], Theorem 7.20): 

Theorem 2 (Pickands, Balkema, de Haan) F e MDA (H^)for some § if and only if 


lim sup | F u (x) - G$,p( u )(x)\ -> 0 

u — >oo x 


for some positive measurable function /3(u) where is the c.d.f of the Generalized 

Pareto (GP) distribution: 


oo 


i - (i + f r m 

i - e-yit 


§ > 0, y > 0 or 

f° r f < o, o < y < 

for % = 0, y > 0 


(25) 
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A Note on the Numerical Evaluation 
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Abstract The Hartman-Watson distribution is an infinitely divisible probability law 
on the positive half-axis whose density is difficult to evaluate near zero. We com- 
pare three different methods to evaluate this density and show that the straightforward 
implementation along Yor’s explicit formula can be improved significantly by resort- 
ing to dedicated Laplace inversion algorithms. In particular, the best method seems 
to be an approach that is specifically designed for distributions from the Bondes- 
son class, to which the Hartman-Watson distribution belongs. The latter approach 
can furthermore be extended to yield an efficient Laplace inversion algorithm for 
evaluating the distribution function of the Hartman-Watson law. 

Keywords Hartman-Watson law • Laplace inversion • Infinitely divisible distribu- 
tions • Bondesson class 
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1 Introduction 

In the process of studying the probability distribution of the integral over a geometric 
Brownian motion, [14] introduced the function 


for r, x > 0. Denoting by 
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e ~hc~ r cosh (y) 


/7r y\ 

sinh(y) sin — j dy, (1) 


0 


338 


G. Bemhart and J.-F. Mai 




Fig. 1 Left The function 0(r, x ) for three different parameters r and values x e (0.15, 4). Right 
The distribution function F r (x) for three different parameters r and values x e (0.15, 10) 


1 /Z\ 2m +^ 

/„(?):= X ,rr Z ^n U) (2) 

ml Tim + v + 1) V2/ 

m = 0 

the modified Bessel function of the first kind, the function f r (x) := 0(r, x)//o(r), 
x > 0, r > 0, is the density of a one-parametric probability law, say p r , 
on the positive half-axis, called the Hartman-Watson law. The Hartman- Watson 
law arises as the first hitting time of certain diffusion processes, see [10], and 
is of paramount interest in mathematical finance in the context of Asian option 
pricing, see [2, 6, 14]. It was shown in [7] that this law is infinitely divisi- 
ble with Laplace transform given by ip r (u) := / Io(r), u >0. More- 

over, it follows from a result in [10] that fi r is not only infinitely divisible, but 
even within the so-called Bondesson class, which is a large subfamily of infi- 
nitely divisible laws that is introduced in and named after [5]. Notice in par- 
ticular that it follows from this fact together with ([13, Theorem 6.2, p.49]) 
that the function ^ r ( u ) := —log (^y 2 w( r )/A)( r ))> u > 0, is a so-called complete 
Bernstein function, which allows for a holomorphic extension to the sliced complex 
plane C \ (— oo, 0). We will make use of this observation in Sect. 5. 

It is well-known that the numerical evaluation of the density of the Hartman- 
Watson law near zero is a challenging task because the integrand in the formula for 
0(r , x) is highly oscillating. The following sections discuss several methods to eval- 
uate the function 6(r,x ) accurately. Figure 1 visualizes the function Q(r,x) for three 
different parameters r and values v e (0.15, 4), where all numerical computation 
routines discussed in the present note yield exactly the same result. 

When looking at Fig. 1, mathematical intuition suggests that the approximation 
0(0.5, v) ~ Oforv < 0.15 might be a pragmatic — and numerically efficient — 
implementation close to zero. Nevertheless, [9] considers the numerical evaluation 
close to zero and obtains significant errors, see Sect. 3. Moreover, [6] studies the 
asymptotic behavior of f r (x) as v f 0 and [2] study the behavior of the distribution 
function F r of p r as the argument tends to zero. We like to mention that the right 
tail of the Hartman-Watson distribution p r becomes extremely heavy as r f 0. For 
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instance, the distribution function F r (x) = f r (t) d t is still significantly smaller 

than 1 for x = 10 and different r, see Fig. 1. 

The remaining article is organized as follows. Section 2 illustrates the occurrence 
of the Hartman-Watson distribution, in particular, in mathematical finance. Section 3 
discusses the direct implementation of Formula (1). Section 4 proposes the use of the 
Gaver-Stehfest Laplace inversion technique. Section 5 proposes a complex Laplace 
inversion algorithm to numerically evaluate f r and F r . Finally, Sect. 6 concludes. 


2 Occurrence of the Hartman-Watson Law 

The most prominent occurrence of the Hartman-Watson distribution is probably in 
directional statistics (see [8]): if W t denotes a two-dimensional Brownian motion 
on the unit circle and r ~ fi r is independent thereof, then W r has the same law 
as (cos(X), sin(X)), where X follows the so-called von Mises distribution with 
parameter r, which has density given by 

fxix) = 1 e r cosW , -7T < X < IT. 

2 7r I 0 (r ) 

The von Mises distribution is the most prominent law for an angle in the field of 
directional statistics, because it constitutes a tractable approximation to the “wrapped 
normal distribution” (i.e., the law of Y mod 2 1 r when Y is normal), which is difficult 
to work with. 

The importance of the Hartman-Watson distribution in the context of mathemat- 
ical finance originates from the fact that 

e 2 t 1 \+e 2x / e x \ 

—= =F(A t edu\W t =x) = -e — 2 ~ 

V27T t u \u / 

where W t denotes standard Brownian motion and A t = Jq e 2 Ws ds an associated 
integrated geometric Brownian motion, see [14]. The process A t , and hence the 
Hartman-Watson distribution, naturally enters the scene when Asian stock deriva- 
tives, i.e., derivatives with “averaging periods,” are considered in the Black-Scholes 
world, see, e.g., [2, 9]. Another example, which is mathematically based on the 
exactly same reasoning, has recently been given in [3]: when the Black-Scholes 
model is enhanced by the introduction of stochastic repo margins, this leads to a 
convexity adjustment for all kinds of stock derivatives which involves the density of 
the Hartman-Watson distribution. 

Let us furthermore briefly sketch a potential third application, which uses a sto- 
chastic representation for the Hartman-Watson law. Consider a diffusion process 
{2G}j>o satisfying the SDE 
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Fig. 2 Evaluation of Formula (1) for r = 0.5 and * 6 [0.125, 0.15] in MATLAB applying the 
built-in adaptive quadrature routine quadgk, which can handle infinite integration domains 

dX '=M(F x 'sl) d ' +d '4 x »='-=-°- 

This explodes with probability one, as can be seen from Feller 4 s test for explosion 
(the drift increases rapidly), i.e., there exists a stopping time r e (0, oo) such that 
paths of {X ? } are well defined on [0, r) and lim^ r X t = oo almost surely. Such 
explosive diffusions are used to model fatigue failures in solid materials. X t describes 
the evolution of the length of the longest crack and r is the time point of ultimate 
damage. Kent [10] shows that r ~ fi r . We may rewrite r as the first hitting time 
of zero of the stochastic process Y t := l/X t , starting at Fo = 1/r > 0. Observing 
the stock price So > 0 of a highly distressed company facing bankruptcy, it might 
now make sense to model the evolution of this company's stock price until default 
as S t := Y t setting r := 1/So. The time of bankruptcy is defined as the first time the 
stock price hits zero, which has a Hartman- Watson law. A similar model, assuming 
S t to follow a CEV process that is allowed to diffuse to zero, is applied in [1]. 


3 Straightforward Implementation Based on Formula (1) 

Regarding the exact numerical evaluation of the Hartman-Watson density, the article 
[9] shows that a straightforward numerical implementation of the Formula (1) for 
r = 0.5 and x e [0.125, 0.1 5] yields significant numerical errors. In particular, Fig. 2 
in [9] shows that one ends up with negative density values. We come to the same 
conclusion, see Fig. 2. 
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4 Evaluation via Gaver-Stehfest Laplace Inversion 


We apply the Gaver-Stehfest algorithm in order to obtain 6(r, •) from its Laplace 
transform /y ^(r) via Laplace inversion for fixed values of r. For a rigorous proof 
and a good explanation of this method, see [11]. In particular, it is not difficult to 
observe from Yor‘s expression (1) that ([11] Theorem l(iii)) applies, which justifies 
the approximation 


or \ ~ / \ T / \ 

0(r, X ) * — — 2- ^ 2 * 108 ( 2 )/, (r >> 

for n e N large enough, where for j = 1 , 2// we have 


(3) 


akin) = 


(-D 


k+n 


n\ 


min{k,n} / \ /o -\ / • \ 


The Gaver-Stehfest algorithm has the nice feature that only evaluations of the Laplace 
transform on the positive half-axis are required. In particular, the required modified 
Bessel function is efficient and easy to compute for u > 0. In MATLAB, it 

is available as the built-in function besseli. The drawback of the Gaver-Stehfest 
algorithm is that it requires high-precision arithmetic because the involved constants 
akin) are alternating and become huge and difficult to evaluate. For practical imple- 
mentations, this prevents the use of large n , which would theoretically be desirable 
due to the convergence result of [1 1] . Nevertheless, our empirical investigation shows 
that n = 10 is still feasible on a standard PC without further precision arithmetic 
considerations and yields reasonable results for the considered parameterization. 
However, for larger values of r, the algorithm is less stable as can be seen at the end 
of Sect. 5. 

The obtained values of 6(r, x) are visualized in Fig. 3. Comparing them to the 
brute force implementation in Fig. 2, the error for small v becomes significantly 
smaller. 


5 Evaluation via a Complex Laplace Inversion Method 
for the Bondesson Class 


As already mentioned in the introduction, the Hartman- Watson law is in the Bondes- 
son class which allows to apply a Laplace inversion algorithm specifically derived for 
such distributions in [4]. Furthermore, this method has the advantage that it imme- 
diately implies as a corollary a similar formula for the distribution function F r . To 
be precise, we have the formula 
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x x 

Fig. 3 Evaluation of 6(r, x ) for r = 0.5 and x e [0.125, 0.15] in MATLAB applying the Gaver- 
Stehfest approximation (3) with n = 10. Left The y-axis is precisely the same as in Fig. 2 for 
comparability. Right The y-axis is made finer to visualize smaller errors (scale 10 -6 ) 


6(r, x ) = 


Me xa 

7T 


[ i m (e xMXo ^ v){bi ~ a) 1 , (r)(bi-a)) 

J mi \y y/2 (a-M log(u) {bi-a)) yr ' y } ) 

0 


dv 

v 


(4) 

with arbitrary parameters a, b > 0 and M > 2 /(ax), and this integral is a proper 
Riemannian integral, since the integrand vanishes for v | 0, see [4] . Regarding the 
choice of the parameters, [4] have shown that a = l/x, M = 3 is usually a good 
choice and we will use these parameters. Concerning the remaining parameter b, we 
choose b — a. For the evaluation of the distribution function F r , it is also shown in 
[4] that 


Me xa 
F r (x) = 

7 T 


/ Im ( 


Im( e x M log ^ ( hi - a ) 


ip r (a — M log(ii) (b i - a)) . 

(b i —a) 

a — M log(n) (b i — a) 


dv 

v 


with the same parameter restrictions as above. One particular challenge with this 
method is that the modified Bessel function needs to be evaluated for complex v. 
A straightforward implementation sufficient for our needs is achieved by using the 
partial sums related to the representation in Eq. (2). It has the advantage that error 
bounds can be computed, as for r > 0 and S v n (r) := Zm=o m!r(m 1 +1/+1) (%) 2m + u , 
one can compute 


K(r) - I,Ar)\ < Q 


Re(0 


1 


oo 

y — 

^ m\ \T(m + v 4- 1)| 

m=n + 1 



Using the Gamma functional equation V(z + 1) = T(z)z, it is easy to see that 
| r (z + 1)1 > | r (z) | for \z\ > 1. Thus, for n > — Re(z/) — 1, the sequence {\T(m + 
V + 1) \ }m=n+l,n+2,... is increasing, yielding 
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x x 

Fig. 4 Evaluation of 0(r, x ) for r = 0.5 and x e [0.125, 0.15] in MATLAB applying the Laplace 
inversion formula (4) with a = b = 1/x and M = 3. The modified Bessel function is implemented 
with accuracy 10 -6 . Left The y-axis is precisely the same as in Fig. 2 for comparability. Right The 
y-axis is made finer to visualize smaller errors (scale 10 -8 ) 


\S v n {r) - I v (r)\ < 


Re(z0 


(0 

\T{n + v + 2)| 


T — (-X 

W 


m=n + 1 


where the series term is the residual of the Taylor expansion of exp(— r 2 /4), which 
allows for a closed-form estimate. Consequently, one is able to choose n such that the 
modified Bessel function is approximated up to a given accuracy. Using the Gamma 
functional equation, one has to compute the complex Gamma function only once 
which further increases efficiency. The complex Gamma function is computed using 
the Lanczos approximation, see [12]. 1 


Figure 4 shows the resulting values of 0(r, x ), where the modified Bessel function 
is approximated with accuracy 10 -6 . Formula (4) is evaluated in MATLAB apply- 
ing the built-in adaptive quadrature routine quadgk. Comparing the results to the 
Gaver-Stehfest inversion, the error for small x is again significantly reduced and 
the results can be even improved by further increasing the accuracy of the modified 
Bessel function. 


A second comparison of the presented methods is included for a larger value 
of r. The Laplace inversion method for the Bondesson class represents the most 
stable and accurate algorithm as can be seen in Fig. 5, which visualizes the values of 
6(r,x ) for small x and r = 3. Whereas the straightforward implementation based on 
Formula (1) fails due to numerical problems and the choice n = 10 is not ideal for 
the Gaver-Stehfest Laplace inversion, the Bondesson method yields stable results. 


1 We use the implementation of P. Godfrey published on http://www.mathworks.com/matlabcentral/ 

fileexchange/ 3572- gamma . 
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x 

Fig. 5 Evaluation of 9(r, x ) for r = 3 and x e [0.125, 0.15] using the three presented methods 
with the same specifications as before, i.e., the Gaver-Stehfest approximation (3) with n = 10 and 
the Laplace inversion formula (4) with a = b = 1 /x and M = 3 


6 Conclusion 

We compared three different methods to numerically evaluate the density of the 
Hartman- Watson law. We found that Laplace inversion algorithms significantly out- 
perform direct implementation of Yor‘s formula (1). Moreover, a dedicated algorithm 
for distributions of the Bondesson class was proposed to numerically evaluate the 
distribution function of the Hartman-Watson law efficiently. 

Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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Computation of Copulas by Fourier Methods 


Antonis Papapantoleon 


Abstract We provide an integral representation for the (implied) copulas of 
dependent random variables in terms of their moment generating functions. The 
proof uses ideas from Fourier methods for option pricing. This representation can 
be used for a large class of models from mathematical finance, including Levy and 
affine processes. As an application, we compute the implied copula of the NIG Levy 
process which exhibits notable time-dependence. 


1 Introduction 

Copulas provide a complete characterization of the dependence structure between 
random variables and link in a very elegant way the joint distribution with the mar- 
ginal distributions via Sklar’s theorem. However, they are a rather static concept and 
do not blend well with stochastic processes which can be used to describe the random 
evolution of dependent quantities, e.g., the evolution of several stock prices. There- 
fore, other methods to create dependence in stochastic models have been developed. 
Multivariate stochastic processes spring immediately to mind, for example, Levy or 
affine processes (cf. e.g., [3, 4, 17] or [15]), while in mathematical finance mod- 
els using time changes or linear mixture models have been developed; see, e.g., 
[5, 10, 12, 13] or [11], to mention just a small part of the existing literature. In 
these approaches, however the copula is typically not known explicitly. Another 
very interesting approach is due to [9], who introduced Levy copulas to characterize 
the dependence structure of Levy processes. 

In this note, we provide a new representation for the (implied) copula of a multidi- 
mensional random variable in terms of its moment generating function. The deriva- 
tion of the main result borrows ideas from Fourier methods for option pricing, and 
the motivation stems from the knowledge of the moment generating function in most 
of the aforementioned models. This paper is organized as follows: in Sect. 2 we pro- 
vide the representation of the copula in terms of the moment generating function; 
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the results are proved for random variables for simplicity, while stochastic processes 
are considered as a corollary. In Sect. 3, we provide two examples to showcase how 
this method can be applied, for example, in performing sensitivity analysis of the 
copula with respect to the parameters of the model. Finally, Sect. 4 concludes with 
some remarks. 


2 Copulas via Fourier Transform Methods 

Let W 1 denote the ^-dimensional Euclidean space, (•, •) the Euclidean scalar product 
and M" the negative orthant, i.e., M" = {x e W 1 : xt < 0 V/}. We consider a random 
variable X = (X \, . . . , X n ) T e R n defined on a probability space (£?, P). We 

denote by F the cumulative distribution function (cdf) of X and by / its probability 
density function (pdf). Let C denote the copula of X and c its copula density function. 
Analogously, let F; and fi denote the cdf and pdf respectively of the marginal X [ , 
for all i e {1, . . . , n}. In addition, we denote by FT 1 the generalized inverse of F/, 
i.e., F. _1 (u) = inf{v e M : F/(v) > w}. 

We denote by Mx the (extended) moment generating function of X : 


for all u e C n such that Mx(u) exists. Let us also define the set 

/ = {i?Gl n : M X (R) < oc and Mx(R + i-) g L 1 ^)}. 

In the sequel, we will assume that the following condition is in force. 

Assumption (B). 8% n M" / 0. 

Remark 1 The integrability of the moment generating function required by Assump- 
tion (B) has the following implications: 

(a) the distribution function F is absolutely continuous with respect to the Lebesgue 
measure; 

(b) the density function / is bounded and continuous; 

(c) the marginal distribution functions F* are also absolutely continuous. 

See [17, Proposition 2.5] for (a) and (b) and [8, Theorem 12.2] for (c). 

Theorem 1 Let X be a random variable that satisfies Assumption (B). The copula 
of X is provided by 


M x (u) = E[e<“’ x> ], 


( 1 ) 



J Mx(R + iv) 


Q—(R+iv,x) 



x i =F- l (u i )’ 


( 2 ) 
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where u e [0, l] n and R e 8%. 

Proof Assumption (D) implies that F\ , . . . , F n are continuous and we know from 
Sklar’s theorem that the copula of X is unique and provided by 

C(u\, = F(F~'(u\), F~ l (u n )); (3) 

see, e.g., [14, Theorem 5.3] for a proof in this setting and [16] for an elegant proof 
in the general case. 

We will evaluate the joint cdf F using the methodology of Fourier methods for 
option pricing. That is, we will think of the cdf as the “price” of a digital option on 
several fictitious assets. Let us define the function 

8 (y) = l{yi <x,,...,y„<x n }(y), X,y G M", (4) 

and denote by 'g its Fourier transform. Then we have that 

F(x) = P(Xi <xi,...,X„< x„) 

= E [ 1 {Xi<jri,...,X„<j:„}] = E[g(X)] 

= J M X (R + iv)g(iR - v)dv, (5) 

R n 


where we have applied Theorem 3.2 in [6] . The prerequisites of this theorem are satis- 
fied due to Assumption (B) and because gR e L l (R n ), where gR(x) := e~( R,x ^g(x) 
for R e W_. 

Finally, the statement follows from (3) and (5) once we have computed the Fourier 
transform of g. We have for 7?/ < 0, i e {1, . . . , n }, 


g(iR -v)= J jW-^giy) dy 

R n 

= f e^ lR V,y ^{yi<xi,...,y n <x n }^y 
R n 


n y 

n / 

l — 1 — 




'd yt 


= (-dT 


!L Q -(Ri+ivi)xi 


i = 1 


Ri + i Vi 


( 6 ) 


which concludes the proof. 

Remark 2 If the moment generating function of the marginals is known, the inverse 
function can be easily computed numerically. We have that 
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F. l (u) = inf{v e R : F;(v) > w} 

= inf{v g R : E[l{x f < v }] > m}, 


where the expectation can be computed using (5) again, while a root finding algorithm 
provides the infimum (using the continuity of Ffi. 

We can also compute the copula density function using Fourier methods, which 
resembles the computation of Greeks in option pricing. 

Lemma 1 Let X be a random variable that satisfies Assumption (B) and assume 
further that the marginal distribution functions F\, ... , F n are strictly increasing 
and continuously differentiable. Then, the copula density function c of X is provided 
by 


c(u) = 


(2 nr Ui=i fi(X; 




Mx(R + iv) e 


~(R+ iv,x) 


dv 


xi=Fr l (uif 


(7) 


where u e (0, \) n and R e 8%. 

Proof The distribution functions F and F \ , . . . , F n are absolutely continuous hence 
the copula density exists, cf. [14, p. 197]. Let u e (0, \) n , then we have that 
Xi = FF l (ui) is finite for every i e {1, . . . , n }, hence e - ^’^ is bounded. Using 
Assumption (B) we get that the function Mx(R + iv)e“^ +lv,x ^ is integrable and we 
can interchange differentiation and integration. Then we have that 


c(u) = 


d n 


du\ . . . du n 


C(u i, ...,u n ) 


d n 1 f , . N e“ 

— / x ( R T iv) — 

du\ . . . du n {—2n) n J XK J Y\U 

R" 

1 [ Mx(R + iv) 9" 

(-2 7t) n J HUM + iv,') 3 mi . . . du n 


~{R+ iv,x) 


=i (Ri + iv?) 

— (/?+iv,jc) 




dv 


(Mi) 


dv. (8) 


Now, since the marginal distribution functions are continuously differentiable, using 
the chain rule and the inverse function theorem we get that 


f Q -{R+iv,x) \ 

\ xi=Fr\ Ui )) 


du\ . . . 3 u r 


i—\ 


n”=, Mxd 


Xi=F. \uif 


(9) 


which combined with (8) yields the required result. 

A natural application of these representations is for the calculation of the cop- 
ula of a random variable X t from a multidimensional stochastic process X = 
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(X t ) t > o. There are many examples of stochastic processes where the corresponding 
characteristic functions are known explicitly. Prominent examples are Levy processes, 
self-similar additive (“Sato”) processes and affine processes. 

Corollary 1 Let X = (X t )t>o b e an valued stochastic process on a filtered 
probability space (f 2 , TP , 0, IP)- Assume that the random variable X t , t > 0, 

satisfies Assumption (B). Then, the copula ofX t is provided by 


C t (u) = 


= d^J 


M x , (R + iv) 


q—{R+ iv.Jc) 

nut * + 


dv 


Xi=F /(«/)’ 

X-t 


( 10 ) 


where u g [0, l] n and R e An analogous statement holds for the copula density 

function c t of X t . 


3 Examples 

We will demonstrate the applicability and flexibility of Fourier methods for the 
computation of copulas using two examples. First, we consider a 2D normal random 
variable and next a 2D normal inverse Gaussian (NIG) Levy process. Although the 
copula of the normal random variable is the well-known Gaussian copula, little was 
known about the copula of the NIG distribution until recently; see Theorem 5. 13 
in [18] for a special case. Hammerstein [19, Chap.2] has now provided a general 
characterization of the (implied) copula of the multidimensional NIG distribution 
using properties of normal mean- variance mixtures. 

Example 1 The first example is simply a “sanity check” for the proposed method. We 
consider the two-dimensional Gaussian distribution and compute the corresponding 
copula for correlation values equal to p = {—1, 0, 1}; see Fig. 1 for the resulting 
contour plots. Of course, the copula of this example is the Gaussian copula, which for 
correlation coefficients equal to (—1,0, 1} corresponds to the countermonotonicity 
copula, the independence copula and the comonotonicity copula respectively. This 
is also evident from Fig. 1. 

Example 2 Let X = (X t ) t >o be a two-dimensional NIG Levy process, i.e. 

X t = (Xj, X 2 ) ~ NIG 2 (a, p, St, tit, A), t > 0. (11) 

The parameters satisfy: a, 8 > 0, f>, ti G M 2 , and A g M 2x2 is a symmetric, positive 
definite matrix (w.l.o.g. we can assume det(A) = 1). Moreover, a 2 > (ft, Aft). The 
moment generating function of X\, for u G M 2 with a 2 — (ft + u, A(/3 + u)) > 0, is 

Mxi (m) = exp | (u, fi) +8 i^jot 2 - (ft, A f) - J a 2 - (ft + u, A (ft + u)) 
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Fig. 1 Contour plots of copulas for Example 1 


cf. [1] . The marginals are also NIG distributed and we have that X\ ~ NIG (a* , ft 1 , 8 l t, 
jl l t ), where 


a = 


( 2_p2 (S S 2 S -1) 


P i =P i +Pj8f j 87.\ 


S’ = S^i, jl‘ = 


Mi, 


for i = {1, 2} and j = {2, 1}; cf. e.g., [2, Theorem 1]. Assumption (B) is satisfied 
for R £ such that a 2 — (ft + R, A(/3 + R)) > 0; see Appendix B in [ 6 ]. Hence 
@ 7^ 0. 

Therefore, we can apply Theorem 1 to compute the copula of the NIG distribution. 
The parameters used in the numerical example are similar to [ 6 , pp. 233-234]: 
a = 10.20, — (I 250 )’ ^ = 0.150, fi = 0, and two matrices A + = (q^) 

and A - = ( _\ ^ ), which lead to positive and negative correlation. The correlation 
coefficients are = 0.1015 and p_ = —0.687 respectively. 

The contour plots are exhibited in Figs. 2 and 3 and show clearly the influence of 
the different mixing matrices A + and A - to the dependence structure. Moreover, 
we can also observe that time has a significant effect on the dependence structure 
of the multidimensional NIG Levy process. This is an interesting observation, since 
the correlation matrix is invariant over time (which is true for any Levy process). 


4 Final Remarks 

We will not elaborate on the speed of Fourier methods compared with Monte Carlo 
methods in the multidimensional case; the interested reader is referred to [7] for a 
careful analysis. Moreover, [20] provides recommendations on the efficient imple- 
mentation of Fourier integrals using sparse grids in order to deal with the “curse of 
dimensionality.” Let us point out though that the computation of the copula function 
will be much quicker than the computation of the copula density, since the integrand 
in (2) decays much faster than the one in (7). One should think of the analogy to 
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2D NIG, positive correlation 2D NIG, negative correlation 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

Fig. 2 Contour plots of copulas for NIG, t — 1 


2D NIG, positive correlation 2D NIG, negative correlation 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

Fig. 3 Contour plots of copulas for NIG, t = \ 


option prices and option Greeks again. Finally, it seems tempting to use these formu- 
las for the computation of tail dependence coefficients. However, due to numerical 
instabilities at the limits, they did not yield any meaningful results. 
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Abstract A goodness-of-fit transformation for Archimedean copulas is presented 
from which a test can be derived. In a large-scale simulation study it is shown that the 
test performs well according to the error probability of the first kind and the power 
under several alternatives, especially in high dimensions where this test is (still) 
easy to apply. The test is compared to commonly applied tests for Archimedean 
copulas. However, these are usually numerically demanding (according to precision 
and runtime), especially when the dimension is large. The transformation underly- 
ing the newly proposed test was originally used for sampling random variates from 
Archimedean copulas. Its correctness is proven under weaker assumptions. It may 
be interpreted as an analogon to Rosenblatt’s transformation which is linked to the 
conditional distribution method for sampling random variates. Furthermore, the sug- 
gested goodness-of-fit test complements a commonly used goodness-of-fit test based 
on the Kendall distribution function in the sense that it utilizes all other components 
of the transformation except the Kendall distribution function. Finally, a graphical 
test based on the proposed transformation is presented. 
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1 Introduction 


From risk R management practice, there is an increasing interest in copula theory 
and applications in high dimensions. One of the reasons is that vectors of risk factor 
changes are typically high-dimensional and have to be adequately modeled; see 
[23, Chap. 2]. In high dimensions, the inherent model risk can be substantial. It 
is, thus, of interest to test whether an estimated or assumed (dependence) model 
is appropriate. One of our goals is, therefore, to present and explore goodness-of- 
fit tests in high dimensions for a widely used class of copulas in practice, namely 
Archimedean copulas. We also investigate the influence of the dimension on the 
conducted goodness-of-fit tests and address the problems that arise specifically in 
high dimensions. 

It is clear that especially in high dimensions, the exchangeability of Archimedean 
copulas becomes an increasingly strong assumption for certain applications. This 
point of criticism applies equally well to all exchangeable copula models including 
the well-known homogeneous Gaussian or t copulas. However, note that these models 
are indeed applied in banks and insurance companies, typically in high dimensions, in 
order to introduce (tail-) dependence to joint models for risks as opposed to assuming 
(tail) independence. We therefore believe that it is important to investigate such 
models in high dimensions. 

Archimedean copulas are copulas which admit the functional form 

Ciu) = xlr(^-\u i) + • • • + xlr~\u d )), u e [0, l] d , (1) 

for an ( Archimedean ) generator \[r, i.e., a continuous, decreasing function \jr : 
[0, oo] — > [0, 1] which satisfies \/f( 0) = 1, 1 ^( 00 ) = lim^oo ty(t) = 0, and which 
is strictly decreasing on [0, inf{^ \ = 0}]. A necessary and sufficient condition 

under which (1) is indeed a proper copula is that x/f is d -monotone, i.e., if/ is continu- 
ous on [0, 00 ], admits derivatives up to the order d — 2 satisfying (— 1 ) k \j/^ k \t) > 0 
for all k e {0, . . . , d — 2}, t e (0, 00 ), and (— 1 ) d ~ 2 \[/^ d ~ 2 \t) is decreasing and 
convex on (0, 00 ), see [20] or [22]. For reasons why Archimedean copulas are used 
in practice, see [9] or [19]. 

Goodness-of-fit techniques for copulas only more recently gained interest, see, 
e.g., [5, 6, 8, 11-14], and references therein. Although usually presented in a d- 
dimensional setting, only some of the publications actually try to apply goodness- 
of-fit tests in more than two dimensions, including [5, 26] up to dimension d = 5 
and [4] up to dimension d = 8. The common deficiency of goodness-of-fit tests for 
copulas in general, but also for the class of Archimedean copulas, is their limited 
applicability when the dimension becomes large. This is mainly due to the lack of 
a simple or at least numerically accessible form as the dimension becomes large. 
Furthermore, parameter estimation usually becomes much more demanding in high 
dimensions; see [19]. 

As a general goodness-of-fit test, the transformation of [25] is well known. 
It is important to note that the inverse of this transformation leads to a popular 
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sampling algorithm, the conditional distribution method, see, e.g., [10]. In other 
words, for a bijective transformation which converts d independent and identically 
distributed (“i.i.d.”) standard uniform random variables to a d-dimensional random 
vector distributed according to some copula C, the corresponding inverse transfor- 
mation may be applied to obtain d i.i.d. standard uniform random variables from a 
d-dimensional random vector following the copula C. In this work, we suggest this 
idea for goodness-of-fit testing based on a transformation originally proposed by [29] 
for sampling Archimedean copulas. With the recent work of [22] we obtain a more 
elegant proof of the correctness of this transformation under weaker assumptions. 
We then apply the first d — 1 components to build a general goodness-of-fit test for 
d-dimensional Archimedean copulas. This complements goodness-of-fit tests based 
on the dth component, the Kendall distribution function, see, e.g., [13, 26], or [14]. 
Our proposed test can be interpreted as an Archimedean analogon to goodness-of-fit 
tests based on Rosenblatt’s transformation for copulas in general as it establishes a 
link between a sampling algorithm and a goodness-of-fit test. The appealing property 
of tests based on the inverse of the transformation of [29] for Archimedean copulas 
is that they are easily applied in any dimension, whereas tests based on Rosenblatt’s 
transformation, as well as tests based on the Kendall distribution function are typi- 
cally numerically challenging. The transformation can also be conveniently used for 
graphical goodness-of-fit testing as recently advocated by [16]. 

This paper is organized as follows. In Sect. 2, commonly used goodness-of-fit 
tests for copulas in general are recalled. In Sect. 3, the new goodness-of-fit test for 
Archimedean copulas is presented. Section 4 contains details about the conducted 
simulation study. The results are presented in Sect. 5 and the graphical goodness-of-fit 
test is detailed in Sect. 6. Finally, Sect. 7 concludes. 


2 Goodness-of-fit Tests for Copulas 

Let X = (X\, . . . , Xj), d > 2, denote a random vector with distribution function 
H and continuous marginals F \ , . . . , Fj. In a copula model for X , one would like to 
know whether C is well represented by a parametric family % = {C (•; 0 ) : 0 e 0} 
where 0 is an open subset of M^ 7 , p e N. In other words, one would like to test the 
null hypothesis 


H 0 : C (2) 

based on realizations of independent copies X L , i e {1, . . . , n}, of X. For testing Ho, 
the (usually unknown) marginal distributions are treated as nuisance parameters and 
are replaced by their slightly scaled empirical counterparts, the pseudo -observations 
Vi = (Uii, . . . , Uid), i € {1, . . . , n], with 


Fnj(Xij), i je d}. 


(3) 
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where Fnj (x) = ^ 'Ylk= l denotes the empirical distribution function of 

the jth data column (the data matrix consisting of the entries Xij, i g {1 , ,n}, 
j G {1, . . . , d}), see [14]. Following the latter approach one ends up with rank- 
based pseudo-observations which are interpreted as observations of C (besides the 
known issues of this interpretation, see Remark 1 below) and are, therefore, used for 
estimating 0 and testing Ho. 

In order to conduct a goodness-of-fit test, the pseudo-observations Ui, i G 
{1, . . . , n), are usually first transformed to some variables U\,i G {1, . . . , n}, so 
that the distribution of the latter is known and sufficiently simple to test under the 
null hypothesis. For Rosenblatt’s transformation (see Sect. 2.1), U\ , i G {1, . . . , n}, 
is also d-dimensional, for tests based on the Kendall distribution function (described 
in Sect. 2.2), it is one-dimensional, and for the goodness-of-fit approach we propose 
in Sect. 3, it is (d — 1) -dimensional. If not already one-dimensional, after such a 
transformation, U\,i G {1, . . . , n], is usually mapped to one-dimensional quantities 
Y[,i g {1, . . . , n), such that the corresponding distribution Fy is again known under 
the null hypothesis. So indeed, instead of (2), one usually considers some adjusted 
hypothesis : Fy g under which a goodness-of-fit test can easily be carried 
out in a one-dimensional setting. For mapping the variates to a one-dimensional 
setting, different approaches exist, see Sect. 2.2. Note that if //q is rejected, so is Ho. 

Remark 1 As, e.g., [8] describe, there are two problems with the approach described 
above. First, the pseudo-observations Ui, i G {1, . . . , d], are neither realizations of 
perfectly independent random vectors nor are the components perfectly following 
univariate standard uniform distributions. This affects the null distribution of the test 
statistic under consideration. All copula goodness-of-fit approaches suffer from these 
effects since observations from the underlying copula are never directly observed 
in practice. A solution may be a bootstrap to access the exact null distribution. 
Particularly in high dimensions, it is often time-consuming, especially for goodness- 
of-fit tests suggested in the copula literature so far. Second, using estimated copula 
parameters additionally affects the null distribution. 


2.1 Rosenblatt’s Transformation and a Corresponding Test 

The transformation introduced by [25] is a standard approach for obtaining realiza- 
tions of standard uniform random vectors U\,i G {1, . . . , n}, given random vectors 
Ui, i g (1 , ,n], from an absolutely continuous copula C which can then be tested 
directly or further mapped to one-dimensional variates for testing purposes. Consider 
a representative d-dimensional random vector U ~ C. To obtain U' ~ U[0, 1]^ (i.e., 
a random vector with independent components, each uniformly distributed on [0, 1]), 
[25] proposed the transformation R : U — > U f , given by 
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U[ = U U 

U ' 2 = C 2 (U 2 I Ui), 


U' d = C d (U d \Ui,...,U d -i), 


where for j e {2, ... , d}, Cj(uj \u\, , uj- 1 ) denotes the conditional distribution 
function of Uj given U\ = u\, . . . , Uj-\ = uj- \. We denote this method for 
constructing goodness-of-fit tests by “R” in what follows. 

Remark 2 Note that the inverse transformation R~ l of Rosenblatt’s transformation 
leads to the conditional distribution method for sampling copulas, see, e.g., [10]. 
This link brings rise to the general idea of using sampling algorithms based on one- 
to-one transformations to construct goodness-of-fit tests. This is done in Sect. 3 to 
construct a goodness-of-fit test for Archimedean copulas based on a transformation 
originally proposed by [29] for sampling random variates. 

To find the quantities Cj(uj \ u\ , . . . , uj- 1 ), j e {2, . . . , d }, for a specific copula 
C (under weak conditions), the following connection between conditional distribu- 
tions and partial derivatives is usually applied; see [27, p.20]. Assuming C admits 
continuous partial derivatives with respect to the first d — 1 arguments, one has 


Cj{Uj I Ml, . . . ,Uj - 0 = 


. . . ,uj ) 

f ) j-i,.4 c(1, "' ,J_1) (Mi, • • • ,Uj- O’ 


i€{2 i}, 

( 4 ) 


where denotes the k-dimensional marginal copula of C corresponding to the 

first k arguments and O/— l , i denotes the mixed partial derivative of order j — 1 
with respect to the first j — 1 arguments. For a d-dimensional Archimedean copula 
C with (d — 1) -times continuously differentiable generator x/r, one has 


Cj{Uj I Ml, . . . , Uj-l) = 


1} (ZL 1 ^)) 


i'e{2 d}. 


( 5 ) 


The problem when applying (4) or (5) in high dimensions is that it is usually quite 
difficult to access the derivatives involved, the price which one has to pay for such a 
general transformation. Furthermore, numerically evaluating the derivatives is often 
time-consuming and prone to errors. 

Genest et al. [14] propose a test statistic based on the empirical distribution func- 
tion of the random vectors U\, i e {1, . . . , d}. As an overall result, the authors 
recommend to use a distance between the distribution under Ho, assumed to be 
standard uniform on [0, l] d , and the empirical distribution, namely 
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S n,d = n J ( D n(u ) - n(u )) 2 d U, 

tali'* 

where 77 (u) — Y[ d j=i u j denotes the independence copula and D n (u) = ^ ^ffi=\ 
1{ u'< u } the empirical distribution function based on the random vectors U\,i e 
{1, . . . , d}. We refer to this transformation as i6 S^ d ” in what follows. 


2.2 Tests in a One-Dimensional Setting 

In order to apply a goodness-of-fit test in a one-dimensional setting one has to summa- 
rize the d-dimensional pseudo-observations U / or U\ via one-dimensional quantities 
Yi,i e {1, . . . , n}, for which the distribution is known under the null hypothesis. In 
what follows, some popular mappings achieving this task are described. 

Nd : Under Ho, the one-dimensional quantities 7* = F^i ( Xj=i ^ _1 (U/ 7 ) 2 ), 
i e {1, . . . , n}, should be i.i.d. according to a standard uniform distribution, 
where F 2 denotes the distribution function of a x 2 distribution with d degrees 

of freedom and ~ 1 denotes the quantile function of the standard normal dis- 
tribution. This transformation can be found, e.g., in [8] and is denoted by 4 Wj” 
in what follows. 

Kc'. For a copula C let Kc denote the Kendall distribution function, i.e., Kcit) = 
P (C(U) < t), t e [0, 1], where U ~ C, see [3] or [22]. Under Ho and 
if Kc is continuous, the random variables 7/ = Kc(C(Ui )) should be i.i.d. 
according to a standard uniform distribution. This approach for goodness-of-fit 
testing will be referred to as “Kc”- Note that in this case, no multidimensional 
transformation of the data is performed beforehand. 

Kn- One can also consider the random vectors U\, i e {1, in conjunc- 
tion with the independence copula, i.e., define 7; = UU ’ where 7/ has 
distribution function Knit) = ^(— lo gt) k . Under Ho, the sample 

7/ = Kn(Yi), i e {1 , ... ,n}, should indicate a uniform distribution on the 
unit interval. This approach is referred to as “Kn” - 

In the approaches Nd, Kc, and Kn we have to test the hypothesis that realizations 
of the random variables 7/, i e {1 , ... ,n}, follow a uniform distribution on the unit 
interval. This may be achieved in several ways, the following two approaches are 
applied in what follows. 

X 2 : Pearson’s x 2 test, see [24, p. 391], shortly referred to as “x 2 ”- 
AD: The so-called Anderson-Darling test, a specifically weighted Cramer- von 
Mises test, see [1, 2]. This method is referred to as “AD”. 
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3 A Goodness-of-fit Test for Archimedean Copulas 

The goodness-of-fit test we now present is based on the following transformation 
from [29] for generating random variates from Archimedean copulas. Note that we 
present a rather short proof of this interesting result, under weaker assumptions. 

Theorem 1 (The main transformation) Let U ~ C, d > 2, where C is an 
Archimedean copula with d-monotone generator x/s and continuous Kendall dis- 
tribution function Kc • Then U' ~ U[0, Y\ d , where 

t'A fferl I*"*! ) (6) 

Proof As shown in [22], (t/^ _1 (f/i), ..., \ ls~ l (U d )) has an t i-norm symmetric dis- 
tribution with survival copula C and radial distribution Fr = W d 1 [i/f], where #j[-] 

denotes the Williamson d-transform. Hence, (t/^ _1 (f/i), . . . , t/^ _1 (C/^)) = RS , 

where R ^ Fr and S ~ U({r G M+|||jc||i = 1}) are independent. For 
Z( o) = 0, Z(j) = 1, and (Zi, . . . , Z d -\) ~ U[0, 1]^ _1 , it follows from [7, p. 

207] that Sj = Z(j) — Z(/_ i), j e {1, . . . , d}, independent of 7?. This implies that 

Vf -1 (£/y) = R(Z(j) — Zy- 1 )), j e {1, . . . , d}, and hence that £/' is in distribution 

equal to W = ((Z^/Z^) 1 , . . . , {Z {d . X) / Z {d) ) d ~\ K C W(R))). Since K c is con- 
tinuous and xlr(R) ~ Kc, Kc(fs(R)) is uniformly distributed in [0, 1]. Furthermore, 
as a function in 7?, Keifs (R)) is independent of (W\, . . . , W^-i). It therefore suf- 
fices to show that (W\, . . . , W d ~ i) ~ U[0, 1]^ _1 , a proof of which can be found in 
[7, p.212]. 

The transformation T : U -> t/ 7 given in (6) can be interpreted as an analogon 
to Rosenblatt’s transformation R specifically for Archimedean copulas. Both T and 
R uniquely map d random variables to d random variables and can therefore be 
used in both directions, for generating random variates and goodness-of-fit tests; 
the latter approach for T is proposed in this paper. The advantage of this approach 
for obtaining the random variables (or their realizations in form of given data) U\ ~ 
U[0, 1 ] d ,i e {1, . . . , ft}, from t/; ~ C,i e {1, . . . , ft}, in comparison to Rosenblatt’s 
transformation lies in the fact that it is typically much easier to compute the quantities 
in (6) than accessing the derivatives in (5). One can then proceed as for Rosenblatt’s 
transformation and use any of the transformations listed in Sect. 2.2 to transform U \ , 
i e {1, . . . , ft}, to the one-dimensional quantities Yi,i e {1, . . . , ft}, for testing 77 q . 
A test involving the transformation T to obtain the random vectors U\ ~ U[0, 1]^, 
i e {1, . . . , ft}, is referred to as approach “T d ” in what follows. 

Note that evaluating the transformation T might only pose difficulties for the 
last component U d , the Kendall distribution function Kc , whereas computing 
j e {1, . . . , d — 1}, is easily achieved for any Archimedean copula with explicit 


364 


C. Hering and M. Hofert 


generator inverse. Furthermore, for large d , evaluation of Kc often gets more and 
more complicated from a numerical point of view (see [18] for the derivatives 
involved), except for specific cases such as Clayton’s family where all involved 
derivatives of ^ are directly accessible, see, e.g., [29], and therefore Kc can be 
computed directly via 1 Kc(t ) = X&=o(0 — x l / ~ 1 (0)* (0) / kl, see, e.g., 
[3] or [22]. Moreover, note that applying 7j for obtaining the transformed data U\, 
i e {1, . . . , n}, requires n -times the evaluation of the Kendall distribution function 
Kc , which can be computationally intensive, especially in simulation studies involv- 
ing bootstrap procedures. With the informational loss inherent in the goodness-of-fit 
tests following the approaches addressed in Sect. 2.2 in mind, one may therefore 
suggest to omit the last component of T and only consider T\, ... , T^- 1 , i.e., 
using the data (U r iX , . . . , U- d-1 ), i e {1, . . . , n}, for testing purposes if d is large. 
This leads to fast goodness-of-fit tests for Archimedean copulas in high dimensions. 
A goodness-of-fit test based on omitting the last component of the transformation T 
is referred to as approach “T^-i” in what follows. 


4 A Large-Scale Simulation Study 
4.1 The Experimental Design 

In our experimental design, focus is put on two features, the error probability of 
the first kind, i.e., if a test maintains its nominal level, and the power under several 
alternatives. To distinguish between the different approaches we use either pairs or 
triples, e.g., the approach “(7^_ i, Nd-i, AD)” denotes a goodness-of-fit test based 
on first applying our proposed transformation T without the last component, then 
using the approach based on the Xd-\ distribution to transform the data to a one- 
dimensional setup, and then applying the Anderson-Darling statistic to test ; 
similarly, d _ x )” denotes a goodness-of-fit test which uses the approach 

S„ d _ i for reducing the dimension and testing Hq . 

In the conducted Monte Carlo simulation, 2 the following ten different goodness- 
of-fit approaches are tested: 


1 It also follows from this formula that Kc converges pointwise to the unit jump at zero for d — ► oo. 

2 All computations were conducted on a compute node (part of the bwGRiD Cluster Ulm) which 
consists of eight cores (two four-core Intel Xeon E5440 Harpertown CPUs with 2.83 GHz and 6 MB 
second level cache) and 16 GB memory. The algorithms are implemented in C/C++ and compiled 
using GCC 4.2.4 with option 02 for code optimization. Moreover, we use the algorithms of the 
Numerical Algorithms Group, the GNU Scientific Library 1.12, and the OpenMaple interface of 
Maple 12. For generating uniform random variates an implementation of the Mersenne Twister by 
[28] is used. For the Anderson-Darling test, the procedures suggested in [21] are used. 
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(Td-i, N d -\, x 2 ), (T d -i,N d -uAD), (T d , t , S*^), (K C ,X 2 ), ( K C ,AD ), 
(T d . N d , AD), (T d , K n , AD), (T d , S%), ( R , N d , AD), ( R , S%). (7) 

Similar to [14], we investigate samples of size n — 150 and parameters of the copulas 
such that Kendall’s tau equals x — 0.25. We working = 5 and d — 20 dimensions for 
comparing the goodness-of-fit tests given in (7). For every scenario, we simulate the 
corresponding Archimedean copulas of Ali-Mikhail-Haq (“A”), Clayton (“C”), Frank 
(“F”), Gumbel (“G”), and Joe (“J”), see, e.g., [15], as well as the Gaussian (“Ga”) 
and t copula with four degrees of freedom (“£ 4 ”); note that we use one-parameter 
copulas ( p = 1) in our study only for simplicity. Whenever computationally feasible, 
N = 1,000 replications are used for computing the empirical level and power. In 
some cases, see Sect. 5, less than 1,000 replications had to be used. For all tests, the 
significance level is fixed at a = 5%. For the univariate x 2 -tests, ten cells were used. 

Concerning the use of Maple, we proceed as follows. For computing the first 
d — 1 components T\, ... , 7j_i of the transformation T involved in the first three 
and the sixth to eighth approach listed in (7), Maple is only used if working under 
double precision in C/C++ leads to errors. With errors, nonfloat values including 
nan, -inf, and inf, as well as float values less than zero or greater than one are 
meant. For computing the component 7j, Maple is used to generate C/C++ code. 
To decrease runtime, the function is then hard coded in C/C++, except for Clayton’s 
family where an explicit form of all derivatives and hence Kc is known, see [29]. The 
same holds for computing Kc for the approaches ( Kc , X 2 ) and ( Kc , AD). For the 
approaches involving Rosenblatt’s transform, a computation in C/C++ is possible 
for Clayton’s family in a direct manner, whereas again Maple’s code generator is 
used for all other copula families to obtain the derivatives of the generator. If there 
are numerical errors from this approach we use Maple with a high precision for the 
computation. If Rosenblatt’s transformation produces errors even after computations 
in Maple, we disregard the corresponding goodness-of-fit test and use the remaining 
test results of the simulation for computing the empirical level and power. 

Due to its well-known properties, we use the maximum likelihood estimator 
(“MLE”) to estimate the copula parameters, based on the pseudo-observations of 
the simulated random vectors Ui ~ C, i e {1, . . . , n}. Besides building the pseudo- 
observations, note that parameter estimation may also affect the null distribution. This 
is generally addressed by using a bootstrap procedure for accessing the correct null 
distribution, see Sect. 4.2 below. Note that a bootstrap can be quite time-consuming in 
high dimensions, even parameter estimation already turns out to be computationally 
demanding. For the bootstrap versions of the goodness-of-fit approaches involving 
the generator derivatives, we were required to hard code the derivatives in order 
to decrease runtime. Note that such effort is not needed for applying our proposed 
goodness-of-fit test (Td- 1 , Nd- 1 , AD), since it is not required to access the generator 
derivatives. 
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4.2 The Parametric Bootstrap 

For our proposed approach (7j_i, Nd- 1 , AD) it is not clear whether the bootstrap 
procedure is valid from a theoretical point of view; see, e.g., [8] and [14]. However, 
empirical results, presented in Sect. 5, indicate the validity of this approach, described 
as follows. 

1. Given the data X*, i e build the pseudo-observations Ui, i e 

{1, . . . , n} as given in (3) and estimate the unknown copula parameter vector 
0 by its MLE 0 n . 

2. Based on Ui, i e {1, . . . , n}, the given Archimedean family, and the para- 

meter estimate 0 n , compute the first d — 1 components £/■•, i e {1 , ,n}, 
j e {1, . . . , d — 1}, of the transformation T as in Eq. (6) and the one-dimensional 
quantities 7/ = 1 (t//^-)) 2 , i e {1, Compute the Anderson- 

Darling test statistic A n = -n - \ £" = i(2i - Dllogi/v,^ (Y {l) )) + log(l - 

F x 2 d J Y ("-‘+V» l 

3. Choose the number M of bootstrap replications. For each k e {1, . . . , M } do: 

a. Generate a random sample of size n from the given Archimedean copula with 
parameter 0 n and compute the corresponding vectors of componentwise scaled 
ranks (i.e., the pseudo-observations) Uf k , i e {1 , ,n}. Then, estimate the 

unknown parameter vector 0 by 0 n k . 

b. Based on U* k , i e {1, . . . , n}, the given Archimedean family, and the 

parameter estimate 0 nk , compute the first d — 1 components U-* k , i e 
{1, . . . , n], j e {1, . . . , d — 1}, of the transformation T as in Eq. (6) and 
Y* k = X/=d k )) 2 , i e {1, . . . , n}. Compute the Anderson-Darling 

test statistic A* k = -n - ~ DUogi^j k )) + log(l - 

v.w ] - 

4. An approximate p-vakie for (T d -\, Nd- 1 , AD) is given by jj Xrli l{A* t >A„)- 

The bootstrap procedures for the other approaches can be obtained similarly. For 
the bootstrap procedure using Rosenblatt’s transformation see, e.g., [14]. For our 
simulation studies, we used M = 1,000 bootstrap replications. Note that, together 
with the number N = 1,000 of test replications, simulation studies are quite time- 
consuming, especially if parameters need to be estimated and especially if high 
dimensions are involved. 

Applying the MLE in high dimensions is numerically challenging and time- 
consuming; see also [19]. Although our proposed goodness-of-fit test can be applied 
in the case d = 100, it is not easy to use the bootstrap described above in such high 
dimensions. We therefore, for d = 100, investigate only the error probability of the 
first kind similar to the case A addressed in [8]. For this, we generate N = 1,000 
100-dimensional samples of size n — 150 with parameter chosen such that Kendall’s 
tau equals r =0.25 and compute for each generated data set the p - value of the test 
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(7j_i, Nd-\, AD) as before, however, this time with the known copula parameter. 
Finally, the number of rejections among the 1,000 conducted goodness-of-fit tests 
according to the five percent level is reported. The results are given at the end of 
Sect. 5. 


5 Results 


We first present selected results obtained from the large-scale simulation study con- 
ducted for the 10 different goodness-of-fit approaches listed in (7). These results sum- 
marize the main characteristics found in the simulation study. As an overall result, 
we found that the empirical power against all investigated alternatives increases if 
the dimension gets large. As expected, so does runtime. 

We start by discussing the methods that show a comparably weak performance 
in the conducted simulation study. We start with the results that are based on the 
test statistics d _ 1 or S„ d to reduce the dimension. Although keeping the error 
probability of the first kind, the goodness-of-fit tests (7^-1, $n d- 1)» $n d)' an d 
(R, S„ d ) show a comparably weak performance against the investigated alternatives, 
at least in our test setup as described in Sect. 4.1. For example, for n = 150, d = 5, 
and r = 0.25, the method (7j, d ) leads to an empirical power of 5.2 % for testing 
Clayton’s copula when the simulated copula is Ali-Mikhail-Haq’s, 1 1.5 % for testing 
the Gaussian copula on Frank copula data, 7.7 % for testing Ali-Mikhail-Haq’s copula 
on data from Frank’s copula, and 6.4 % for testing Gumbel’s copula on data from 
Joe’s copula. Similarly for the methods (7j_i, d _ 1 ) and ( R , S^ d ). We therefore 

do not further report on the methods involving d _ x or d in what follows. The 
method (7j, Kn, AD) also shows a rather weak performance for both investigated 
dimensions and is therefore omitted. Since the cases of (Kc, X 2 ) and (Kc, AD) as 
well as the approaches (7j_i , Nd - 1 , AD) and (7j_i , Nd - 1 , X 2 ) do not significantly 
differ, we only report the results based on the Anderson-Darling tests. 

Now consider the goodness-of-fit testing approaches (7^-1, Nd-i, AD), 
(Kc, AD), and (Td, Nd, AD). Recall that (T^-i, Nd- 1, AD) is based on the first 
d — 1 components of the transformation T addressed in Eq. (6), (Kc, AD) applies 
only the last component of T , and (Td, Nd, AD) applies the whole transformation T 
in d dimensions, where all three approaches use the Anderson-Darling test for testing 
Hq. The test results for the three goodness-of-fit tests with n = 150, r = 0.25, and 
d e { 5, 20} are reported in Tables 1, 2, and 3, respectively. As mentioned above, we 
use a bootstrap procedure to obtain approximate p -values and test the hypothesis 
based on those p-values. We use N = 1,000 repetitions wherever possible. In all 
cases involving Joe’s copula as Ho copula only about 650 repetitions could be fin- 
ished. As Tables 1 and 2 reveal, in many cases, (7j_i, Nd- 1, AD) shows a larger 
empirical power than (Kc, AD) (for both d), but the differences in either direction 
can be large (consider the case of the U copula when the true one is Clayton (both d) 
and the case of the Frank copula when the true is one is Clayton (both d)). Overall, 
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Table 1 Empirical power in % for (Td-\, Nd- 1 , AD) based on N = 1,000 replications with 
n = 150, r = 0.25, and d = 5 (left), respectively d = 20 (right) 


H o 

True copula, d = 5 

True copula, d = 20 

A 

C 

F 

G 

J 

Ga 

t\ 

A 

C 

F 

G 

J 

Ga 

t\ 

A 

4.8 

10.5 

68.5 

97.8 

100.0 

34.2 

94.0 

5.2 

4.8 

98.1 

97.8 

100.0 

47.2 

100.0 

C 

35.4 

4.7 

92.8 

99.6 

100.0 

84.2 

100.0 

95.3 

6.1 

100.0 

100.0 

100.0 

100.0 

100.0 

F 

2.9 

10.5 

5.3 

58.5 

94.8 

15.8 

99.4 

0.3 

12.8 

5.4 

63.5 

100.0 

77.6 

100.0 

G 

24.5 

56.6 

8.9 

5.2 

10.3 

17.0 

99.3 

99.4 

100.0 

24.9 

5.2 

77.0 

100.0 

100.0 

J 

71.7 

92.9 

41.1 

13.7 

4.9 

76.4 

100.0 

98.6 

98.4 

84.4 

6.9 

5.2 

100.0 

100.0 


Table 2 Empirical power in % for (Kc, AD) based on N = 1,000 replications with n = 150, 
t = 0.25, and d = 5 (left), respectively d = 20 (right) 



True copula, d = 

5 




True copula, d = 20 

H 0 

A 

C 

F 

G 

J 

Ga 

U 

A 

C 

F 

G 

J 

Ga 

t\ 

A 

6.1 

33.7 

13.5 

38.3 

83.6 

11.5 

44.4 

4.2 

16.8 

0.0 

1.7 

8.9 

59.5 

82.4 

c 

30.6 

5.1 

95.5 

86.9 

99.3 

28.8 

7.7 

65.9 

5.6 

100.0 

99.8 

100.0 

45.5 

4.1 

F 

41.4 

97.6 

4.0 

63.7 

59.5 

48.1 

88.9 

90.0 

100.0 

5.2 

99.9 

100.0 

98.5 

100.0 

G 

12.0 

24.3 

41.1 

4.9 

5.4 

6.9 

16.3 

9.5 

56.8 

93.0 

6.5 

60.7 

1.3 

8.3 

J 

70.1 

50.5 

70.5 

3.0 

5.5 

29.0 

12.8 

100.0 

100.0 

99.8 

1.8 

6.7 

100.0 

100.0 


Table 3 Empirical power in % for (7j, AD) based on N = 1,000 replications with n = 150, 
t = 0.25, and <7 = 5 (left), respectively d = 20 (right) 


H o 

True copula, d = 5 

True copula, d = 20 

A 

C 

F 

G 

J 

Ga 

t\ 

A 

C 

F 

G 

J 

Ga 

t\ 

A 

4.2 

8.4 

36.4 

83.1 

99.7 

21.6 

98.4 

5.3 

16.2 

98.0 

96.6 

100.0 

68.8 

100.0 

C 

6.9 

4.7 

16.9 

65.9 

90.2 

25.3 

100.0 

86.3 

5.3 

99.7 

99.9 

100.0 

100.0 

100.0 

F 

4.4 

3.1 

4.9 

16.7 

46.1 

9.1 

99.2 

0.4 

5.6 

5.0 

30.8 

100.0 

25.9 

100.0 

G 

3.8 

5.8 

1.8 

5.0 

15.8 

3.7 

98.7 

94.7 

100.0 

8.2 

7.1 

85.3 

98.6 

100.0 

J 

11.1 

17.5 

6.4 

4.8 

4.8 

10.8 

99.7 

100.0 

100.0 

74.8 

3.5 

5.3 

98.7 

100.0 


when the true copula is the t\ copula, (Jd- l, Nd-i, AD) performs well. Given the 
comparably numerically simple form of (7j_i , Aj_i , AD), this method can be quite 
useful. Interestingly, by comparing Table 1 with Table 3, we see that if the transfor- 
mation T with all d components is applied, there is actually a loss in power for the 
majority of families tested (the cause of this behavior remains an open question). 
Note that in Table 2 for the case where the Ali-Mikhail-Haq copula is tested, the 
power decreases in comparison to the five-dimensional case. This might be due to 
numerical difficulties occurring when Kc is evaluated in this case, since the same 
behavior is visible for the method (Kc, x 2 )- 

Table 4 shows the empirical power of the method (R, Nd, AD). In compar- 
ison to our proposed goodness-of-fit approach (Td-i, Nd-u AD), the approach 
(R, Nd, AD) overall performs worse. For d = 5, there are only two cases where 
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Table 4 Empirical power in % for ( R , Nd, AD) based on N = 1,000 replications with n = 150, 


t = 0.25, and d = 5 (left), respectively d = 20 (right) 



True copula, d = 

5 




True copula, d = 20 

H o 

A 

C 

F 

G 

J 

Ga 

U 

A 

C 

F 

G 

J 

Ga 

t\ 

A 

4.5 

8.9 

46.9 

79.1 

98.8 

11.0 

94.2 

* 

* 

* 

* 

* 

* 

* 

C 

11.7 

5.0 

17.7 

53.5 

68.8 

10.4 

99.7 

93.4 

5.3 

100.0 

100.0 

100.0 

100.0 

100.0 

F 

3.4 

2.6 

5.5 

15.8 

61.6 

5.7 

99.5 

- 

- 

- 

- 

- 

- 

- 

G 

4.9 

4.0 

1.2 

3.0 

14.5 

1.2 

97.9 

- 

- 

- 

- 

- 

- 

- 

J 

21.1 

21.8 

9.5 

4.3 

3.6 

7.2 

99.7 

- 

- 

- 

- 

- 

- 

- 


(R, Nd, AD) performs better than (7j_i, Nd-\, AD) which are testing the Ali- 
Mikhail-Haq copula when the true copula is and testing Joe’s copula when the 
true one is Gumbel. In the high-dimensional case d — 20, only results for the 
Clayton copula are obtained. In this case the actual number of repetitions for cal- 
culating the empirical power is approximately 500. For the cases when testing the 
Ali-Mikhail-Haq, Gumbel, Frank, or Joe copula, no reliable results were obtained 
since only about 20 repetitions could be run in the runtime provided by the grid. This 
is due to the high-order derivatives involved in this transformation, which slow down 
computations considerably; see [19] for more details. 

Another aspect, especially in a high-dimensional setup is numerical precision. In 
going from the low- to the high-dimensional case we faced several problems dur- 
ing our computations. For example, the approach (R, Nd, AD) shows difficulties in 
testing the Hq copula of Ali-Mikhail-Haq for d — 20. Even after applying Maple 
(with Digits set to 15; default is 10), the goodness-of-fit tests indicated numeri- 
cal problems. The numerical issues appearing in the testing approaches (Kc, AD) 
and (Td, Nd, AD) when evaluating the Kendall distribution function were already 
mentioned earlier, e.g., in Sect. 4.1. In principal, one could be tempted to choose 
a (much) higher precision than standard double in order to obtain more reliable 
testing results. However, note that this significantly increases runtime. Under such a 
setup, applying a bootstrap procedure would not be possible anymore. In high dimen- 
sions, only the approaches {Td- 1 , Nd-i, AD) and {Td- 1 , Nd-i, X 2 ) can be applied 
without facing computational difficulties according to precision and runtime. 

Concerning the case d = 100, we checked if the error probability of the first 
kind according to the 5 % -level is kept. As results of the procedure described in 
the end of Sect. 4.2, we obtained 4.6, 4.2, 5.0, 5.5, and 4.9% for the families of 
Ali-Mikhail-Haq, Clayton, Frank, Gumbel, and Joe, respectively. 


6 A Graphical Goodness-of-fit Test 


A plot often provides more information than a single /7-value, e.g., it can be used 
to determine where deviations from uniformity are located; see [16] who advocate 
graphical goodness-of-fit tests in higher dimensions. We now briefly apply the trans- 
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Fig. 1 Data from a Gaussian (left) and t\ (right) copula with parameter chosen such that Kendall’s 
tau equals 0.5, transformed with a Gumbel copula with parameter such that Kendall’s tau equals 
0.5. The deviations from uniformity are small but visible, especially in the corners of the different 
panels 

formation T : U — > U' addressed in Theorem 1 to graphically check how well the 
transformed variates indeed follow a uniform distribution. Figures 1 , 2, and 3 show 
scatter-plot matrices of 1 ,000 generated three-dimensional vectors of random vari- 
ates which are transformed with T under various assumed models (the captions are 
self-explanatory). Since Kc is easily computed in three dimensions, we also use this 
last component of T . 



Fig. 2 Data from a Clayton (left) and Gumbel (right) copula with parameter chosen such that 
Kendall’s tau equals 0.5, transformed with a Gumbel copula with parameter such that Kendall’s 
tau equals 0.5. The deviation from uniformity for the Clayton data is clearly visible. Since the 
Gumbel data is transformed with the correct family and parameter, the resulting variates are indeed 
uniformly distributed in the unit hypercube 
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Fig. 3 Data from a Gumbel copula with parameter chosen such that Kendall’s tau equals 0.5, 
transformed with a Gumbel copula with parameter such that Kendall’s tau equals 0.2 (left) and 0.8 
(right), respectively. Deviations from uniformity are easily visible 


7 Conclusion and Discussion 


Goodness-of-fit tests for Archimedean copulas, also suited to high dimensions were 
presented. The proposed tests are based on a transformation T whose inverse is 
known for generating random variates. The tests can, therefore, be viewed as analogs 
to tests based on Rosenblatt’s transformation, whose inverse is also used for sampling 
(known as the conditional distribution method). The suggested goodness-of-fit tests 
proceed in two steps. In the first step, the first d — 1 components of T are applied. 
They provide a fast and simple transformation from d to d — 1 dimensions. This 
complements known goodness-of-fit tests using only the dth component of T, the 
Kendall distribution function, but which require the knowledge of the generator 
derivatives. In a second step, the d — 1 components are mapped to one-dimensional 
quantities, which simplifies testing. This second step is common to many goodness- 
of-fit tests and hence any such test can be applied. 

The power of the proposed testing approach was compared to other known 
goodness-of-fit tests in a large-scale simulation study. In this study, goodness-of- 
fit tests in comparably high dimensions were investigated. The computational effort 
(precision, runtime) involved in applying commonly known testing procedures turned 
out to be tremendous. The results obtained from these tests in higher dimensions have 
to be handled with care: Numerical issues for the methods for which not all repetitions 
could be run without problems might have introduced a bias. To apply commonly 
known goodness-of-fit tests in higher dimensions requires (much) more work in the 
future, especially on the numerical side. Computational tools which systematically 
check for numerical inaccuracies and which are implemented on the paradigm of 
defensive programming might provide a solution here; see [17] for a first work in 
this direction. 
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In contrast, our proposed approach is easily applied in any dimension and its 
evaluation requires only small numerical precision. Due to the short runtimes, it 
could also be investigated with a bootstrap procedure, showing good performance 
in high dimensions. Furthermore, it easily extends to the multiparameter case. To 
reduce the effect of non-robustness with respect to the permutation of the arguments, 
one could randomize the data dimensions as is done for Rosenblatt’s transformation, 
see [4]. 

Finally, a graphical goodness-of fit test is outlined. This is a rather promising 
field of research for high-dimensional data, since, especially in high dimensions, 
none of the existing models fits perfectly, and so a graphical assessment of the parts 
(or dimensions) of the model which fit well and those which do not is in general 
preferable to a single p- value. 
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Duality in Risk Aggregation 


Raphael Hauser, Sergey Shahverdyan and Paul Embrechts 


Abstract A fundamental problem in risk management is the robust aggregation of 
different sources of risk in a situation where little or no data are available to infer 
information about their dependencies. A popular approach to solving this problem 
is to formulate an optimization problem under which one maximizes a risk measure 
over all multivariate distributions that are consistent with the available data. In sev- 
eral special cases of such models, there exist dual problems that are easier to solve 
or approximate, yielding robust bounds on the aggregated risk. In this chapter, we 
formulate a general optimization problem, which can be seen as a doubly infinite lin- 
ear programming problem, and we show that the associated dual generalizes several 
well-known special cases and extends to new risk management models we propose. 


1 Introduction 

An important problem in quantitative risk management is to aggregate several indi- 
vidually studied types of risks into an overall position. Mathematically, this translates 
into studying the worst-case distribution tails of ^ (X), where & : W 1 — >► M is a given 
function that represents the risk (or undesirability) of an outcome, and where X is a 
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random vector that takes values in R n and whose distribution is only partially known. 
For example, one may only have information about the marginals of X and possibly 
partial information about some of the moments. 

To solve such problems, duality is often exploited, as the dual may be easier to 
approach numerically or analytically [2-5, 14]. Being able to formulate a dual is also 
important in cases where the primal is approachable algorithmically, as solving the 
primal and dual problems jointly provides an approximation guarantee throughout 
the run of a solve: if the duality gap (the difference between the primal and dual 
objective values) falls below a chosen threshold relative to the primal objective, the 
algorithm can be stopped with a guarantee of approximating the optimum to a fixed 
precision that depends on the chosen threshold. This is a well-known technique in 
convex optimization, see, e.g., [1]. 

Although for some special cases of the marginal problem analytic solutions and 
powerful numerical heuristics exist [6, 12, 13, 18, 19], these techniques do not apply 
when additional constraints are imposed to force the probability measures over which 
we maximize the risk to conform with empirical observations: In a typical case, the 
bulk of the empirical data may be contained in a region D that can be approximated 
by an ellipsoid or the union of several (disjoint or overlapping) polyhedra. For a 
probability measure fi to be considered a reasonable explanation of the true distribu- 
tion of (multidimensional) losses, one would require the probability mass contained 
in D to lie in an empirically estimated confidence region, that is, i < /x(£>) < u 
for some estimated bounds i < u. In such a situation, the derivation of robust risk 
aggregation bounds via dual problems remains a powerful and interesting approach. 

In this chapter, we formulate a general optimization problem, which can be seen as 
a doubly infinite linear programming problem, and we show that the associated dual 
generalizes several well known special cases. We then apply this duality framework 
to a new class of risk management models we propose in Sect. 4. 


2 A General Duality Relation 

Let (r, 0) and (X, 6) be complete measure spaces, and let A : r x 0 -> M, 

a : r ^ R, B : X ^ R, b : X ^ R, and c : 0 -> R be bounded measurable 
functions on these spaces and the corresponding product spaces. Let and 

be the set of signed measures with finite variation on (d>, #), (T, 0), and (X, 0) 
respectively. We now consider the following pair of optimization problems over 
and x respectively, 


(P) sup 
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j B(z,x)d&(x) = b(z), (ze£), 

0 

& > 0 , 


and 

(D) 


, inf ^ [a(y)d&(y)+ fb(z)dS'(.z), 

,y)6 t /gX t 4 g J J 

r e 


s.t. J A(y,x)d&(y) + J B(z, 
r e 

& > 0 . 


^)dy(z) > c(x), (x e 0), 


We claim that the infinite-programming problems (P) and (D) are duals of each other. 

Theorem 1 (Weak Duality) For every (P)-feasible measure 3F and every (D)- 
feasible pair (£f , we have 


c(x) d3F(x) < 


J a(y) d&(y) + J b(z)dy(z). 

r e 


Proof Using Fubini’s Theorem, we have 


J c(x) dJF(x)< J A(y,x)d(%?X'jP)(y,x)-\- J B(z, x) d(y x ^)(z, x) 

0 fx0 Ex0 


< J a(y) d&(y) + J b(z) d^(z). 

r e 


In various special cases, such as those discussed in Sect. 3, strong duality is known 
to hold subject to regularity assumptions, that is, the optimal values of (P) and (D) 
coincide. Another special case under which strong duality applies is when the mea- 
sures , and 5? have densities in appropriate Hilbert spaces, see the forthcoming 
DPhil thesis of the second author [17]. 

We remark that the quantifiers in the constraints can be weakened if the set of 
allowable measures is restricted. For example, if is restricted to lie in a set of 
measures that are absolutely continuous with respect to a fixed measure % £ 
then the quantifier (y e T) can be weakened to (%-almost all y e T). 
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3 Classical Examples 


Our general duality relation of Theorem 1 generalizes many classical duality results, 
of which we now point out a few examples. Let p(x\, . . . , Xk) be a function of k 
arguments. Then we write 




1 if p(x) > 0, 
0 otherwise. 


In other words, we write the argument x of the indicator function directly into the set 
{ y : p(y) > 0} that defines the function, rather than using a separate set of variables 
y. This abuse of notation will make it easier to identify which inequality is satisfied 
by the arguments where the function l{y: P (y)>0}W takes the value 1. 

We start with the Moment Problem studied by Bertsimas and Popescu [2], who 
considered generalized Chebychev inequalities of the form 


(F) sup P[r(X) > 0] 

x 

s.t. E = ( keJ ), 

X a random vector taking values in M" , 

where r : W 1 — > M is a multivariate polynomial and J C N n is a finite sets of multi- 
indices. In other words, some moments of X are known. By choosing 0 = W 1 , 

r = 0 , z = j u {0}, 


B(k, x) = x \ x , . . . , x „ n , b(k) = bk, (k e J), 
B(0 9 X) = l R n, b( 0) = 1, 


and c(x) = l{ x: r(x)>0} ? where we made use of the abuse of notation discussed above, 
problem (P’) becomes a special case of the primal problem considered in Sect. 2, 


(P) 



l{x: r(x)>0} d & (^) 

x[\...,x k n n A^{x) = b k , 


ld&(.x) = 1, 


(k e J), 


& > 0 . 
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Our dual 



S.t. ^ Zkx\ 1 Zo > l{x:r(x)>0), O € R") 


is easily seen to be identical with the dual (D’) identified by Bertsimas and Popescu, 


Note that since r, U are finite, the constraints of (D’) are polynomial copositivity 
constraints. The numerical solution of semi-infinite programming problems of this 
type can be approached via a nested hierarchy of semidefinite programming relax- 
ations that yield better and better approximations to (D’). The highest level problem 
within this hierarchy is guaranteed to solve (D’) exactly, although the corresponding 
SDP is of exponential size in the dimension n, in the degree of the polymomial r, 
and in ma xj ce j(^ i hi). For further details see [2, 7, 10], and Sect. 4.6 below. 

Next, we consider the Marginal Problem studied by Riischendorf [15, 16] and 
Ramachandran and Riischendorf [14], 


where is the set of probability measures on W 1 whose marginals have the 

cdfs Fi (i = 1, . . . , n). Problem (P’) can easily be seen as a special case of the 
framework of Sect. 2 by setting c(x) = h(x ), 0 = W 1 , r = 0, U = N n x M, 
B(i , z, x) = 1 {y:yi<z\ (using the abuse of notation discussed earlier), and bi(z) = 
Fi(z ) (i e N n , z e M), 



s.t. Vx e R n , r(x) > 0 =>► . . . , x„ n + zo - 1 > 0, 


Vx g M”, ^ Zk*i l , • • • , x^ n + zo > 0. 


(P’) sup 

^ e ^Fl’-’ FnR n 




R 


& > 0 . 
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Taking the dual, we find 


(D) inf V f F l (z)d.9' l (z) 

l — 1 ID) 


n r 

t TjJ Uxi<z}dyi(z) > h(x), (X e R"). 

i = l in> 


The signed measures ^ being of finite variation, the functions S; (z) = ^((— oo,z]) 
and the limits 57 = lim^oo 5/(z) = ^((—00, +00)) are well defined and finite. 
Furthermore, using lim z ^_oo Fi(z) = 0 and lim-^ +oc Fi(z) = 1, we have 


n n n r 

X J Fi(z)d,?>(z) = X ( ^(z)Si(z)t% - J Si(z)dFi(z) 
i=l M i=1 V M 

n n r 

= 2 > -X / 

i=l i=l - 

* /• 

- 2 /< 

and likewise, 

VI p Up II 

X / d ^Cz) = Z / ld ^> = I> - $&))• 

„• 1 'J „■ 1 J : 1 


(si-SiizVdFiiz), 


i=l : 


+ OO 


i = l : 


i=l 


z = l 


Writing //, (r) = .v, — 5,(z), (D) is, therefore, equivalent to 


(O’) 



hi(z)dFi(z) 


n 

s.t. ^ hj{xj) > h(x), (x G M w ). 
i=0 


This is the dual identified by Ramachandran and Riischendorf [14] . Due to the general 
form of the functions hi , the infinite programming problem (D’) is not directly 
usable in numerical computations. However, for specific h (v) , (D’ )-feasible functions 
(h\, . . . , h n ) can sometimes be constructed explicitly, yielding an upper bound on 
the optimal objective function value of (P’) by virtue of Theorem 1. Embrechts and 
Puccetti [3-5] used this approach to derive quantile bounds on X\ H \~X n , where 
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X is a random vector with known marginals but unknown joint distribution. In this 
case, the relevant primal objective function is defined by h(x) = l{ x:e T x>t }, where 
t e M is a fixed level. More generally, h(x ) = l{ x: \p( x )>t} can be chosen, where *F is 
a relevant risk aggregation function, or h(x) can model any risk measure of choice. 

Our next example is the Marginal Problem with Copula Bounds, an extension 
to the marginal problem mentioned in [3]. The copula defined by the probability 
measure & with marginals F[ is the function 

: [0, If [0, 1], 

F (Ff 1 (in),..., F - 1 («„)). 

A copula is any function ^ : [0, 1]^ — ^ [0, 1] that satisfies ^ for some 

probability measure ^ on W 1 . Equivalently, a copula is the multivariate cdf of any 
probability measure on the unit cube [0, l] n with uniform marginals. In quantitative 
risk management, using the model 


sup 




F\,...,F n 


/ 


h(x) dJ?(x) 


to bound the worst-case risk for a random vector X with marginal distributions F t 
can be overly conservative, as no dependence structure between the coordinates of 
Xi is assumed given at all. The structure that determines this dependence being the 
copula where & is the multivariate distribution of X , Embrechts and Puccetti 
[3] suggest problems of the form 

(P’) sup / h{x) d /x(x), 

S.t. < ^up? 

as a natural framework to study the situation in which partial dependence information 
is available. In problem (P’), ^ i 0 and ^ up are given copulas, and inequality between 
copulas is defined by pointwise inequality, 

^o(m) < «>(m) (m e [ 0 , l] n ). 

Once again, (P’) is a special case of the general framework studied in Sect. 2, as it is 
equivalent to write 
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(P) sup [ h(x) d^(x) 

& J 


S - 1 - / 1 {x<(Ff I (H 1 ),...,F„- 1 (H„)))^’ X > d ^W <^up(w), (He [ 0 , 1 ]"), 

R n 

[ ~ ^{JC<(i r f 1 (Mi),...,F“ 1 (M n ))}( M » X ) d^(^) < -^lo(M), ( M G [O’ l] n )’ 

R" 

J hxi<z}(z, 


x)d &(x) = F i (z), (ieN„, ze: 


^ > 0. 


The dual of this problem is given by 


(D) inf / ^ U p(M)d^up(«)- [ ^io(«) d0J o («) + X f F/(z) d ^-(z) 

^up 5 ^lo 5 !?•••? t -' ri J J , J 

rn ii« rn *— A i» 


[0,1]” [0,1]" 
s ' 1 ' / 1 {x<(Fr 1 (Hi),...,F„- 1 («„))) ( “’ Jc)d%p( “ ) 


[0,1]” 


[0,1]” 

n . 

+ X / d^(z) > (* € M"), 


r lo> ^up 


* = 1 T 


> o. 


Using the notation st , Si introduced in Sect. 3, this problem can be written as 


inf 


/ 


[0,ip 


I ^ 0 (M)d^ 0 (M) + X/ 

nn * = 1 TCP 


%p (w) d Sf up (m) — / iflo (m) d ^lo ( m) + 2J (®i-Si(z))dFi(z) 


[ 0 , 1 ]” 

n 


s t. %p(#W) - ^ 0 (^W) + 2> “ 5 «' (*i» > AW, (X e K"), 

i=l 

^up,^lo> 0 , 


where ^(v) = {w e [0, 1 ] n : u > (F\(x \), . . . , F n (x n ))}. To the best of our 
knowledge, this dual has not been identified before. 

Due to the high dimensionality of the space of variables and constraints both in 
the primal and dual, the marginal problem with copula bounds is difficult to solve 
numerically, even for very coarse discrete approximations. 
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4 Robust Risk Aggregation via Bounds on Integrals 

In quantitative risk management, distributions are often estimated within a paramet- 
ric family from the available data. For example, the tails of marginal distributions 
may be estimated via extreme value theory, or a Gaussian copula may be fitted to the 
multivariate distribution of all risks under consideration, to model their dependen- 
cies. The choice of a parametric family introduces model uncertainty , while fitting a 
distribution from this family via statistical estimation introduces parameter uncer- 
tainty. In both cases, a more robust alternative would be to study models in which 
the available data is only used to estimate upper and lower bounds on finitely many 
integrals of the form 


where 0(x) is a suitable test function. A suitable way of estimating upper and lower 
bounds on such integrals from sample data xt (i e N&) is to estimate confidence 
bounds via bootstrapping. 


4.1 Motivation 

To motivate the use of constraints in the form of bounds on integrals (1), we offer the 
following explanations: First of all, discretized marginal constraints are of this form 
with piecewise constant test functions, as the requirement that F/ (§&) — F* (§&_ i ) = bk 
(k = 1 , . . . , i ) for a fixed set of discretization points §o < • • • < §£ can be expressed 
as 


It is, furthermore, quite natural to relax each of these equality constraints to two 
inequality constraints 


when bk is estimated from data. 

More generally, constraints of the form P[X e Sj ] < b \ • for some measurable 
Sj C W 1 of interest can be written as 



( 1 ) 



( 2 ) 



0 
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I \s J (x)d,¥(x) <b). 

0 

A collection of i constraints of this form can be relaxed by replacing them by a 
convex combination 



where the weights Wj > 0 satisfy ^ Wj = 1 and express the relative importance of 
each constituent constraint. Nonnegative test functions thus have a natural interpre- 
tation as importance densities in sums-of-constraints relaxations. This allows one to 
put higher focus on getting the probability mass right in regions where it particularly 
matters (e.g., values of X that account for the bulk of the profits of a financial insti- 
tution), while maximzing the risk in the tails without having to resort to too fine a 
discretization. 

While this suggests to use a piecewise approximation of a prior estimate of the 
density of X as a test function, the results are robust under mis-specification of this 
prior, for as long as <p(x) is nonconstant, constraints that involve the integral (1) 
tend to force the probability weight of X into the regions where the sample points 
are denser. To illustrate this, consider a univariate random variable with density 
f{x) = 2/3(1 +x) oni e [0, 1] and test function <f>(x) = 1 -\-ax with a e [— 1, 1]. 
Then (p(x)f(x) dx = l+5a/9. The most dispersed probability measure on [0 , 1 ] 
that satisfies 


l 

J 4>(x) d^(x) = l + y (3) 

0 

has an atom of weight 4/9 at 0 and an atom of weight 5/9 at 1 independently of a , 
as long as a ^ 0. The constraint (3) thus forces more probability mass into the right 
half of the interval [0, 1], where the unknown (true) density fix) has more mass and 
produces more sample points. 

As a second illustration, take the density fix) = 3x 2 and the same linear test 
function as above. This time we find Jq 1 (pix) fix) dx = 1 + 3a/ 4, and the most 
dispersed probability measure on [0, 1] that satisfies 
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has an atom of weight 3/4 at 0 and an atom of weight 1/4 at 1 independently of 
a ^ 0, with similar conclusions as above, except that the effect is even stronger, 
correctly reflecting the qualitative features of the density f(x). 


4.2 General Setup and Duality 


Let 0 be decomposed into a partition 0 = U?=i °f polyhedra Si with nonempty 
interior, chosen as regions in which a reasonable number of data points are available 
to estimate integrals of the form (1). 

Each polyhedron has a primal description in terms of generators, 

Si = com(q[, ...,q l n .) + cone(r|, . . . , r l 0 .) 

where conv(g[, . . . , q l n .) is the polytope with vertices q l n gR", and 


cone(rJ , . . . ,r l 0 .) 


Oi 

> 0 (m € N 0 .) 

m = 1 


is the polyhedral cone with recession directions r l m gI". Each polyhedron also has 
a dual description in terms of linear inequalities, 

ki 

Si = Pi {x e M" : (f‘,x) > , 

7=1 

for some vectors e W 1 and bounds i\ e M. The main case of interest is where Si 
is either a finite or infinite box in W 1 with faces parallel to the coordinate axes, or an 
intersection of such a box with a linear half- space, in which case it is easy to pass 
between the primal and dual descriptions. Note however that the dual description is 
preferrable, as the description of a box in W 1 requires only 2 n linear inequalities, 
while the primal description requires 2 n extreme vertices. 

Let us now consider the problem 

(P) sup / h(x)dJP(x) 


J 




S.tJ 

(p s (x)d &(x) < a s , 

(s = 1, . 

. . , M), 

J 

0 




s.t. / 

i/r t (x)d&(x) = b t , 

(1 = 1,. 



o 
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J 1 dJ?(x) = 1, 

0 

& > 0, 


where the test functions \// t are piecewise linear on the partition 0 = IJ/=i and 
where —h(x) and the test functions 4> s are piecewise linear on the infinite polyhedra 
of the partition, and either jointly linear, concave, or convex on the finite polyhedra 
(i.e., polytopes) of the partition. The dual of (P) is 


(D) 


inf 

<j,z)dR M + Ar + 1 


M N 

+^j b tZt + zo, 

5=1 t= 1 


M N 

s.t. ^ysfaix) + Xz'Mx) + 20 1<pC*0 - h(x) >0, <x e 0), 
5=1 f=l 

( 4 ) 


y > 0. 


We remark that (P) is a semi-infinite programming problem with infinitely many 
variables and finitely many constraints, while (D) is a semi-infinite programming 
problem with finitely many variables and infinitely many constraints. However, 
the constraint (4) of (D) can be rewritten as copositivity requirements over the 
polyhedra Si , 

M N 

^S,ys<Ps(x) + ’y.Zt^ti.x) + zo l<p(*) -h(x) > 0, (x e Si), (i = l,...,k). 

5=1 f=l 


Next we will see how these copositivity constraints can be handled numerically, often 
by relaxing all but finitely many constraints. Nesterov’s first-order method can be 
adapted to solve the resulting problems, see [8, 9, 17]. 

In what follows, we will use the notation 

M N 

<Py, z (x) = ^ y s <t>s(x ) + ^Zt^t(x) + zo - h(x). 

5=1 f=l 


4.3 Piecewise Linear Test Functions 


The first case we discuss is when (p s \ and h \s t are jointly linear. Since we fur- 
thermore assumed that the functions \lr t \ are linear, there exist vectors v l s g M", 
w\ eW 1 , g l g W 1 and constants c l s g M, d] G M and ^ gM such that 
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4>s\Si(x) = [V s ,x) + 4 , 
ft\Si(x) = {w l t ,x) +d l t , 
h\si(x ) = (g‘,x) +e l . 


The copositivity condition 

M N 


Y.y,Mx) + ^Ztift(x) + Z0 1 <p(x) - h(x) > 0 , (x e Si) 


s= 1 t=l 

can then be written as 


(f l j, *)>£), = 

M 


N 


M 


+ X ZfW l ~ gl ’ x ) - e ' - “ X Zf ^ “ z °- 

5 = 1 t = 1 


'5 = 1 t= 1 

By Farkas’ Lemma, this is equivalent to the constraints 

M N ki 



+X z ^-s i = XAA’ 

(5) 

5=1 

t= 1 7=1 


M 

N ki 



M 

1 

O 

IA 

M 

>- 

( 6 ) 

5 = 1 

t= 1 7=1 



A > 0 , a = 

(7) 


where k l - are additional auxiliary decision variables. 

Thus, if all test functions are linear on all polyhedral pieces Si, then the dual 
(D) can be solved as a linear programming problem with M + N + 1 + X?=i ki 
variables and k(n + 1) linear constraints, plus bound constraints on y and the k l j. 
More generally, if some but not all polyhedra correspond to jointly linear test function 
pieces, then jointly linear pieces can be treated as discussed above, while other pieces 
can be treated as discussed below. 

Let us briefly comment on numerical implementations, further details of which 
are described in the second author’s thesis [17]: An important case of the above 
described framework corresponds to a discretized marginal problem in which (j) s (x ) 
are piecewise constant functions chosen as follows for s = (/, j), (l = 1 , . . . , n; j = 
1 , ,m): Introduce m + 1 breakpoints &<*{<■■•<& along each coordinate 
axis l , and consider the infinite slabs 

S t ,j = [x e R" : s)_ 1 <*,<$)}, 0 = 1,..., m). 
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Then choose <t> L j(x) = 1 s . (x), the indicator function of slab S L j. We remark that 
this approach corresponds to discretizing the constraints of the Marginal Problem 
described in Sect. 3, but not to discretizing the probability measures over which we 
maximize the aggregated risk. 

While the number of test functions is nm and thus linear in the problem dimension, 
the number of polyhedra to consider is exponentially large, as all intersections of the 
form 


n 

= n 

i=i 

for the m n possible choices of j e have to be treated separately. In addition, in VaR 

applications h(x) is taken as the indicator function of an affine half-space {x : ^ x L > 
r} for a suitably chosen threshold r, and for CVaR applications h(x) is chosen as the 
piecewise linear function h(x) = max(0, ^ x L — r). Thus, polyhedra S L j that meet 
the affine hyperplane {x : ^x L = z] are further sliced into two separate polyhedra. 
A straightforward application of the above described LP framework would thus lead 
to an LP with exponentially many constraints and variables. Note however that the 
constraints (5)-(7) now read 


ki 



II 

tM 

>- 

(8) 

M 

ki 


Y,ysc l s 

1 

N 

o 

IA 

M 

>* 

—■ 

(9) 

s= 1 

;= i 



A> o, u = 

(10) 


as =0 and no test functions V'vOO were used, with g l = [ l ... l ] T when Si c {x : 
— r l an d g l = 0 otherwise. That is, the vector that appears in the left-hand 
side of Constraint (8) is fixed by the polyhedron Si alone and does not depend on the 
decision variables y,zo- Since zo is to be chosen as small as possible in an optimal 
solution of (D), the constraint (9) has to be made as slack as possible. Therefore, the 
optimal values of k l . are also fixed by the polyhedron Si alone and are identifiable 
by solving the small-scale LP 


(k l j)* = argmax ^ A ! j i l j 


7 = 1 


j = i 

o, (7 = 1,...,^/). 
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In other words, when the polyhedron Si is considered for the first time, the variables 
(X l j)* can be determined once and for all, after which the constraints (8)-(10) can be 
replaced by 


e 1 -^ysc's -Z0< Ci, 

s 


where Ci = Xy=i(^y)*^y> an d where the sum on the left-hand side only extends 
over the n indices s that correspond to test functions that are nonzero on Si . Thus, 
only the nm + 1 decision variables (y , zo) are needed to solve (D). Furthermore, the 
exponentially many constraints correspond to an extremely sparse constraint matrix, 
making the dual of (D) an ideal candidate to apply the simplex algorithm with delayed 
column generation. A similar approach is possible for the situation where (j) s is of 
the form 


(p s (x) = 1 Ss (x) x ((v s ,x) +Cj) , 

for all ^ = (i, j). The advantage of using test functions of this form is that fewer 
breakpoints ^ L j are needed to constrain the distribution appropriately. 


4.4 Piecewise Convex Test Functions 

When 4> s \s i and —h\s t are jointly convex, then cp yz (x) is convex. The copositivity 
constraint 

M N 

+ Z 0 1 <p(x) - h(x) >0, (x e Si) 

5=1 t=\ 

can then be written as 

(fj,x) >£ l j, (7 = 1,..., kj) <Py tZ (x) > 0, 

and by Farkas’ Theorem (see ,e.g., [11]), this condition is equivalent to 

ki 

t Py, z M + ^^j(e j -(fj,x))> 0, (xeM"), (11) 

j = i 

y> o, u = i,...,ki), 

where X l j are once again auxiliary decision variables. While (11) does not reduce to 
finitely many constraints, the validity of this condition can be checked numerically 
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by globally minimizing the convex function (p yz (x) + X (pj ~ (/],*))• The 

constraint (11) can then be enforced explicitly if a line-search method is used to solve 
the dual (D). 


4.5 Piecewise Concave Test Functions 

When (j) s | and —h\s t are jointly concave but not linear, then cp yz (x) is concave 
and Si = conv(g{, . . . , q 1 ) is a polytope. The copositivity constraint 

M N 

^ystpsix) + ^Ztir t (x) + zo M*) - h(x) > 0, (x e Si) (12) 

5 = 1 1 = 1 

can then be written as 

<Py, z (q))> 0, 0=1,..., Hi). 

Thus, (12) can be replaced by ni linear inequality constraints on the decision variables 
y, and z t . 


4.6 Piecewise Polynomial Test Functions 

Another case that can be treated via finitely many constraints is when (p s \ , \// t \ , 

and h\s t are jointly polynomial. The approach of Lasserre [7] and Parrilo [10] can 
be applied to turn the copositivity constraint 

(f},x) >1), U = 1 =*<P y , z ( x ) 

into finitely many linear matrix inequalities. However, this approach is generally 
limited to low-dimensional applications. 


5 Conclusions 

Our analysis shows that a wide range of duality relations in use in quantitative 
risk management can be understood from the single perspective of a generalized 
duality relation discussed in Sect. 2. An interesting class of special cases is provided 
by formulating a finite number of constraints in the form of bounds on integrals. 
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The duals of such models are semi-inifinite optimization problems that can often be 
reformulated as finite optimization problems, by making use of standard results on 
copositivity. 
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Some Consequences of the Markov Kernel 
Perspective of Copulas 


Wolfgang Trutschnig and Juan Fernandez Sanchez 


Abstract The objective of this paper is twofold: After recalling the one-to-one 
correspondence between two-dimensional copulas and Markov kernels having the 
Lebesgue measure A on [0, 1] as fixed point, we first give a quick survey over some 
consequences of this interrelation. In particular, we sketch how Markov kernels can 
be used for the construction of strong metrics that strictly distinguish extreme kinds of 
statistical dependence, and show how the translation of various well-known copula- 
related concepts to the Markov kernel setting opens the door to some surprising 
mathematical aspects of copulas. Secondly, we concentrate on the fact that iterates 
of the star product of a copula A with itself are Cesaro convergent to an idempotent 
copula A with respect to any of the strong metrics mentioned before and prove that 
A must have a very simple form if the Markov operator T a associated with A is 
quasi-constrictive in the sense of Lasota. 

Keywords Copula • Doubly stochastic measure • Markov kernel • Markov operator 


1 Introduction 

In 1996, Olsen et al. (see [23]) proved the existence of an isomorphism between 
the family ^ of two-dimensional copulas (endowed with the so-called star prod- 
uct) and the family of all Markov operators (with the standard composition 
as binary operation). Using disintegration (see [29]) allows to express the afore- 
mentioned Markov operators in terms of Markov kernels, resulting in a one-to-one 
correspondence of ^ with the family of all Markov kernels having the Lebesgue 
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measure A. on [0, 1] as fixed point. Identifying every copula with its Markov kernel 
allows to define new metrics D\ , D 2 , D 0 0 which, contrary to the uniform one, strictly 
separate independence from complete dependence (full predictability). Additionally, 
the ‘translation’ of various copula-related concepts from 'Y? to jjt and 383 has proved 
useful in so far that it allowed both, for alternative simple proofs of already known 
properties as well as for new and interesting results. Section 3 of this paper is a quick 
incomplete survey over some useful consequences of this translation. In particular, 
we mention the fact that for each copula A e Y?, the iterates of the star product of 
A with itself are Cesaro converge to an idempotent copula A w.r.t. each of the three 
metrics mentioned before, i.e., we have 

lim Di(s* n (A), A) = 0 

ft— ^00 

whereby s* w (A) = ^ 2?=i A* 1 for every n e N. Section 4 contains some new 
unpublished results and proves that the idempotent limit copula A must have a very 
simple (ordinal-sum-like) form if the Markov operator Ta corresponding to A is 
quasi-constrictive in the sense of Lasota ([1, 15, 18]). 


2 Notation and Preliminaries 


As already mentioned before, will denote the family of all (two-dimensional) 
copulas , Joo will denote the uniform metric on Y> . For properties of copulas, we refer 
to [8, 22, 26]. For every A e /x a will denote the corresponding doubly stochastic 

measure , the class of all these doubly stochastic measures. Since copulas are 
the restriction of two-dimensional distribution functions with % (0, 1) -marginals 
to [0, l] 2 , the Lebesgue decomposition of every element in has no discrete 
component. The Lebesgue measure on [0, 1] and [0, l] 2 will be denoted by A and A 2 , 
respectively. For every metric space (Q , d), the Borel a -field on 32 will be denoted 
by 38(32). A Markov kernel from R to 38 (W) is a mapping Ox 38 (W) -> [0, 1] 
such that v i-> K(x, B) is measurable for every fixed B e 38 (R) and B \-^ K(x, B) 
is a probability measure for every fixed v e R. Suppose that X, Y are real-valued 
random variables on a probability space (£?, srf , &), then a Markov kernel K : 
R x 38 (W) — > [0, 1] is called regular conditional distribution of Y given X if for 
every B e 38 (W) 

K(X(co), B) = E(l fl o Y\X){(o) (1) 

holds ^-a.s. It is well known that for each pair (X, Y) of real-valued random variables 
a regular conditional distribution Kf, •) of Y given X exists, that Kf, •) is unique 
^ x -a.s. (i.e., unique for ^ x -almost all x e R) and that Kf, •) only depends on 
. Hence, given A e ^ we will denote (a version of) the regular conditional 
distribution of Y given X by Ka(-, ■) and refer to Ka(-, ■) simply as regular condi- 
tional distribution of A or as the Markov kernel of A. Note that for every A e 
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its conditional regular distribution K A { •, •)> and every Borel set G e ^([0, l] 2 ) we 
have 

f K A (x,G x )dX(x) = fi A (G ), (2) 

[ 0 , 1 ] 

whereby G x := {y e [0, 1] : (x, y ) e G} for every x e [0, 1]. Hence, as special 
case, 

/ K A (x,F)dX(x)=X(F) (3) 

[ 0 , 1 ] 

for every F e <^([0, 1]). On the other hand, every Markov kernel K : [0, 1] x 
miO, 1]) [0, 1] fulfilling (3) induces a unique element /i e <^V([0, l] 2 ) via 

(2). For more details and properties of conditional expectation, regular conditional 
distributions, and disintegration see [13, 14]. 

Sf will denote the family of all A. -preserving transformations : [0, 1] — > [0, 1] 
(see [34]), Ff p the subset of all bijective h e . A copula A e *€ will be called 
completely dependent if and only if there exists h e 2F such that K(x , E) := \eQix) 
is a regular conditional distribution of A (see [17, 29] for equivalent definitions and 
main properties). For every h e Ff , the corresponding completely dependent copula 
will be denoted by Ch, the class of all completely dependent copulas by %. 

A linear operator T onL 1 ®, 1]) := L l ([ 0, 1], ^([0, 1]), X) is called Markov oper- 
ator ([3, 23] if it fulfills the following three properties: 

1. T is positive, i.e., T (/) > 0 whenever f > 0 

2. r(i [0 ,i]) = i[o,ii 

3. /[0.1 ]<Jf)(x)d\{x) = / [0fl] f(x)dX(x) 

As mentioned in the introduction will denote the class of all Markov operators 
on /^([O, 1]). It is straightforward to see that the operator norm of T is one, i.e., 
|| 7" || := sup{||7Y||i : ||/||i< 1} = 1 holds. According to [23] there is a one-to- 
one correspondence between and — in fact, the mappings <P : ^ -> and 
& : JX -> defined by 


<P(A)(f)(x): = (T A f)(x) :=A j A A x, t)f(t)dX(t), 

[ 0 , 1 ] 

V(T)(x,y): = A T (x,y):= f (Tl [ 0 ty] )(t)dX(t) 

[0,x] 


(4) 


for every / e /^([O, 1]) and (x,y) e [0, l] 2 (A } 2 denoting the partial derivative of 
A w.r.t. y), fulfill F o @ = id<# and G> o *F = idj^. Note that in case of / := l[o,);] 
we have (7a1[o,);])00 = Ap(x, y) A-a.s. According to [29] the first equality in (4) 
can be simplified to 
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(T A f)(x) = E(foY\X = x)= f f (y)K A (x, dy) £-a.s. (5) 

[ 0 , 1 ] 

It is not difficult to show that the uniform metric d Q 0 is a metrization of the weak 
operator topology on (see [23]). 


3 Some Consequences of the Markov Kernel Approach 

In this section, we give a quick survey showing the usefulness of the Markov kernel 
perspective of two-dimensional copulas. 


3.1 Strong Metrics on c € 

Expressing copulas in terms of their corresponding Markov kernels, the metrics 
D\ , £>2, £>oo on ^ can be defined as follows: 

D\{A, B) := f j |/sT A (x,[0,);])-/s: B (x,[0,y])|JMx)JA(y) (6) 

[ 0 , 1 ] [ 0 , 1 ] 

D\{A,B)-.= f f \K A (x,[0,y])-K B (x,[0,y])\ 2 dX(x)dHy) (7) 

[ 0 , 1 ] [ 0 , 1 ] 

DooiA, B) := sup [ \K a (x, [0, y]) - K B (x, [0, y])\ 2 dX(x) (8) 

ye[ 0 , 1 ] J 
[ 0 , 1 ] 

The following two theorems state the most important properties of the metrics £>i , £>2 
and £>oo. 

Theorem 1 ([29]) Suppose that A, A\, A 2, . . . are copulas and let T, T\, £2, . . . 
denote the corresponding Markov operators. Then the following four conditions are 
equivalent: 

(a) lim^oo £>i(A„, A) = 0 

(b) lim^oo £>00 (A n , A) = 0 

(c) linwoo II Tnf - Tf 111 = 0 for every f e L>([0, 1]) 

(d) lim^^oo D 2 (A n , A) = 0 

As a consequence, each of the three metrics £>i, D 2 and D 0 0 is a metrization of the 
strong operator topology on ^ . 
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Theorem 2 ([29]) The metric space D\) is complete and separable. The same 
holds for (f € , Df) and Off , £>oo)- The topology induced on by D\ is strictly finer 
than the one induced by d^. 

Remark 3 The idea of constructing metrics via conditioning to the first coordinate 
can be easily extended to the family of all m -dimensional copulas for arbitrary 
m > 3. For instance, the multivariate version of D\ on can be defined by 


whereby [0, y] = yf\ and Ka(Kb) denotes the Markov kernel (regular 

conditional distribution) of Y given X for (X,Y) ~ A(B). As shown in [11], 
the resulting metric spaces ( c to m , D i), ( < ^ ?m , £> 2 ), (^ m , £> 00 ) are again complete and 
separable. 


3.2 Induced Dependence Measures 

The main motivation for the consideration of conditioning-based metrics like D\ 
was the need for a metric that, contrary to Joo, is capable of distinguishing extreme 
types of statistical dependence, i.e., independence and complete dependence. For 
the uniform metric Jqq, it is straightforward to construct sequences (Ch n ) n eN of 
completely dependent copulas (in fact, even sequences of shuffles of M, see [9, 22]) 
fulfilling lim^oo doo(Ch n , TJ) = 0 — for D 1 , however, the following result holds: 

Theorem 4 ([29]) For every A e ^ we have £>i(A,£7) < 1/3. Furthermore, 
equality £>i(A, TJ) = 1/3 holds if and only if A e %. 

As a straightforward consequence, we may define z\ : ^ — >► [0, 1] by 


This dependence measure x\ exhibits the seemingly natural properties that (i) exactly 
members of the family % (describing complete dependence) are assigned maximum 
dependence (equal to one) and (ii) 77 is the only copula with minimum dependence 
(equal to zero). Note that (i) means that x\ (A) is maximal if and only if A describes 
the situation of full predictability, i.e., asset Y is a deterministic function of asset X. 
In particular, all shuffles of M have maximum dependence. Dependence measures 
based on the metric £>2 may be constructed analogously. 

Example 5 For the Farlie-Gumbel-Morgenstern family (Go) e [—1, 1] of copulas 
(see [22]), given by 



I K a (x, [0, y]) - K B (x, [0,y])\dX(x)dX m ~ l (y), 


[0,l] m_1 [0,1] 


ri (A) := 3£>i(A, 77). 


(9) 


G$(x, y) = xy + 0xy( 1 - x)(l - y). 


( 10 ) 
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it is straightforward to show that x\(Gq) = ^ holds for every 0 e [—1,1] (for 
details see [29]). 


Example 6 For the Marshall- Olkin family (M^)^ ^ G [ 0 ,i] 2 of copulas (see [22]), 
given by 


Ma,p(x, y ) 


x l 01 y if x a > yP 
xy l ~P if x a < yP . 


(ID 


it can be shown that 


ft Wx.jS) = 3a (1 



1 - (1 -a) z 

z 


6 1 - (1 -a) z+l 
P z + 1 


( 12 ) 


holds, whereby z = ^ + j — 1 (for details again see [29]). 

Remark 7 The dependence measure x\ is nonmutual, i.e., we do not necessarily have 
x\ (A) = x\ (A*), whereby A 1 denotes the transpose of A (i.e., A r (x, y) = A(y, x)). 
This reflects the fact that the dependence structure of random variables might be 
strongly asymmetric, see [29] for examples as well as [27] for a measure of mutual 
dependence. 

Remark 8 Since most properties of D\ in dimension two also hold in the general 
ra -dimensional setting it might seem natural to simply consider x\ (A) := aD\ (A , 77) 
as dependence measure on ( Y? m (a being a normalizing constant). It is, however, 
straightforward to see that this yields no reasonable notion of a dependence quan- 
tification in so far that we would also have x\ (A) > 0 for copulas A describing 
independence of X and Y = (Y\, . . . , Y m - 1 ). For a possible way to overcome this 
problem and assign copulas describing the situation in which each component of a 
portfolio (Ti, . . . , F m _i) is a deterministic function of another asset X maximum 
dependence we refer to [11]. 

Remark 9 It is straightforward to verify that for samples (X\, Y\), . . . , (. X n , Y n ) 
from Ag^ the empirical copula E n (see [22, 28]) cannot converge to A w.r.t. D\ 
unless we have A e %. Using Bernstein or checkerboard aggregations (smoothing 
the empirical copula) might make it possible to construct D\ -consistent estimators 
of x\ (A). Convergence rates of these aggregations and other related questions are 
future work. 


3.3 The IFS Construction of (Very) Singular Copulas 

Using Iterated Function Systems, one can construct copulas exhibiting surprisingly 
irregular analytic behavior. The aim of this section is to sketch the construction and 
then state two main results. For general background on Iterated Function Systems 
with Probabilities (IFSP, for short), we refer to [16]. The IFSP construction of two- 
dimensional copulas with fractal support goes back to [12] (also see [2]), for the 
generalization to the multivariate setting we refer to [30]. 
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Definition 10 ([12]) An x m- matrix x — (^7)1=1, 7=1, ...,m is called transfor- 
mation matrix if it fulfills the following four conditions: (i) max (ft, m) > 2, (ii) all 
entries are non-negative, (iii) JT ■ pj — 1 , and (iv) no row or column has all entries 0. 
T will denote the family of all transformations matrices. 

Given x e 1 define the vectors ( aj )™ =0 , (^)" =0 of cumulative column and row 
sums by fto = ^0 = 0 and 

ft 7ft 

a./ = Z Zfyo j e {1< •••»'”}> - Z Z V/ ie(l 4 (13) 

70 <7 i = l i'o<i 7 = 1 

Since r is a transformation matrix both (<37 )™ =0 and (Z?/ )" =0 are strictly increasing and 
Tvy; := , ft 7 ] x , bi] is a compact rectangle with nonempty interior for all 

j e {1, . . . , m) and i e {1, . . . , ft}. Set I := {(/, j) : hj > 0} and consider the IFSP 
{[0, l] 2 , ( fji)(ij)eT ’ whereby the affine contraction : [0, l] 2 R,, 

is given by 


fji(x,y) = {aj - 1 + x(ft/ -ftj-i), fe/_i +y(fe; -Z?/-i)). (14) 

Z* e J^([0, l] 2 ) will denote the attractor of the IFSP (see [16]). The induced 
operator % on ^([0, l] 2 ) is defined by 

m ft 

%(ID :=ZZ f y ^ fi> = Z f y • (is) 

7=! *=1 (i,7)e/ 

It is straightforward to see that 3^ maps into itself so we may view 3^ also 
as operator on . According to [12] there is exactly one copula A* e to which 
we will refer to as invariant copula , such that ^(/xa*) = pi a* holds. The IFSP 
construction also converges w.r.t. D\ — the following result holds: 

Theorem 11 ([29]) Let x € T be a transformation matrix. Then V T is a contraction 
on the metric space if € , D\) and there exists a unique copula A * such that V T A* = 
A* and for every B e *€ we have lim^oo D\(VfB, A*) = 0. 

Example 12 Figure 1 depicts the density of ffifl) for n e {1, 2, 3, 5}, whereby x 
is given by 


x = 



Moreover (again see [12]) the support Supp(piA *) of /xa* fulfills X2 (Supp(iiA*)) = 
0 if x contains at least one zero. Hence, in this case, /xa* is singular w.r.t. the Lebesgue 
measure A.2, we write /xa* -L A. 2. On the other hand, if x contains no zeros we may 
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Step 3 



i i i 


0 1/3 2/3 1 


Step 5 



“i 1 1 r 

0 1/3 2/3 1 


Fig. 1 Image plot of the density of if (ft) for n e {1, 2, 3, 5} and r according to Example 12 


still have /xa* -L A 2 although in this case /xa* has full support [0, l] 2 . In fact, an even 
stronger and quite surprising singularity result holds — letting T denote the family of 
all transformation matrices r (i) containing no zeros, (ii) fulfilling that the row sums 
and column sums through every tij are identical, and (iii) /xa* A 2 we have the 
following striking result: 

Theorem 13 ([33]) Suppose that t g T. 77z£/i corresponding invariant copula 
A * w singular w.r.t. A 2 and has full support [0, l] 2 . Moreover, for X-almost every 

A* 

x g [0, 1] the conditional distribution function y i-> F x T (y) = Ka*(x, [0, y]) w 
continuous, strictly increasing and has derivative zero A- almost everywhere. 


3.4 The Star Product of Copulas 

Given A, B e ^ the .star product A* B £ is? is defined by (see [3, 23] ) 

(A * B)(x, y ) := f A^(x, t)Bj(t, y)dX(t) 

[ 0 , 1 ] 


( 16 ) 
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and fulfills Ta*b = 0a*b = 0(A) o 0(B) = Ta o 7#, so the mapping 0 in 
equation (4) actually is an isomorphism. A copula A e is called idempotent if 
A * A = A holds, the family of all idempotent copulas will be denoted by ff ip . For 
a complete characterization of idempotent copulas we refer to [4] (also see [26]). 
The star product can easily be translated to the Markov kernel setting — the following 
result holds: 

Lemma 14 ([30]) Suppose that A, B e and let Ka, Kb denote Markov kernels 
of A and B. Then the Markov kernel Ka o Kb, defined by 


is a Markov kernel of A* B. Furthermore ^ ip is closed in (ftf , D i). 

Remark 15 Let A e ^ be arbitrary. If (X n ) ne ?q is a stationary Markov process 
on [0, 1] with (stationary) transition probability Ka(-, •) and X\ ~ ^(0,1) then 
(X n , X n +i) ~ A for every n e N and Lemma 14 implies that (X \ , X n +\) ~ A * A * 
• • • * A =: A* n , i.e., the n - step transition probability of the process is given by the 
Markov kernel of A* n . 

Remark 16 In case the copulas A, B are absolutely continuous with densities kA 
and ks it is straightforward to verify that A* B is absolutely continuous with density 
k A *B given by 


Since the star product of copulas is a natural generalization of the multiplication 
of doubly stochastic matrices and doubly stochastic idempotent matrices are fully 
characterizable (see [10, 25]) the following result underlines how much more com- 
plex the family of idempotent copulas is (also see [12] for the original result without 
idempotence). 

Theorem 17 ([30]) For every s e (1,2) there exists a transformation matrix x s e T 
such that: 

1. The invariant copula A* is idempotent. 

2. The Hausdorff dimension of the support of A* is s. 

Example 18 For the transformation matrix r from Example 12 the invariant copula 
A£ is idempotent and its support has Hausdorff dimension In 5/ In 3. Hence, set- 
ting A := A* and considering the Markov process outlined in Remark 15 we have 
(Xi,X i+n ) - A for all i,n e N. The same holds if we take A := ff (17) for 
arbitrary j e N since this A is idempotent too. 



(17) 


[ 0 , 1 ] 



( 18 ) 


[ 0 , 1 ] 
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We conclude this section with a general result that will be used later on and which, 
essentially, follows from Von Neumanns mean ergodic theorem for Hilbert spaces 
(see [24]) since Markov operators have operator norm one. For every copula Ae? 
and every n e N as in the Introduction we set 




1 n 

; 2 >* 


i = 1 


(19) 


Theorem 19 ([32]) For every copula A there exists a copula A such that 


lim D\ (s*„(A), A) = 0. 

n — >oo x ' 


( 20 ) 


This copula A is idempotent, symmetric, and fulfills A*A = A*A = A. 

As nice by-product, Theorem 19 also offers a very simple proof of the fact that 
idempotent copulas are necessarily symmetric (originally proved in [4]). 


4 Copulas Whose Corresponding Markov Operator Is 
Quasi-constrictive 

Studying asymptotic properties of Markov operators quasi-constrictiveness is a very 
important concept. To the best of the authors’ knowledge, there is no natural/simple 
characterization of copulas whose Markov operator is quasi-constrictive. The objec- 
tive of this section, however, is to show that the D\ -limit A of s* n (A) has a very simple 
form if T a is quasi-constrictive. We start with a definition of quasi-constrictiveness 
in the general setting. In general, T is a Markov operator on L l (f2, sti , /x) if the 
conditions (M1)-(M3) from Sect. 2 with [0, 1] replaced by £?, ^([0, 1]) replaced by 
srf , and A replaced by /x hold. 

Definition 20 ([1, 15, 18]) Suppose that (£?, sf , /x) is a finite measure space and let 
, /x) denote the family of all probability densities w.r.t. /x. Then a Markov 
operator T : L 1 (T2 , sf , /x) — > L 1 (T2 , #/, /x) is called quasi- constrictive if there exist 
constants 8 > 0 and k < 1 such that for every probability density / e , /x) 

the following inequality is fulfilled: 

lim sup / T n f(x)dp,(x) < k for every E e srf with /x(£) < 8 (21) 

n—>oo J 
E 

Komornik and Lasota (see [15]) have shown in 1987 that quasi-constrictivity is 
equivalent to asymptotic periodicity — in particular they proved the following spec- 
tral decomposition theorem : For every quasi-constrictive Markov operator T there 
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exist an integer r > 1, densities gi, . , g r c f^(£?, /x) with pairwise disjoint 

support, essentially bounded non-negative functions h\, ... ,h r e L°°(£2, srf , /x) 
and a permutation o of {1, . . . , r] such that for every / e L l (£2, srf , /x) 

= ±(I fhidiijg a n {i) (x) + /?*/(*) with \im°\\R n f\\i =0 (22) 
* =1 ^ 

holds. Furthermore (see again [1, 15, 18]), in case of /x(£?) = 1 there exists a 
measurable partition (£))- =1 of £2 in sets with positive measure such that gj and a 
in (22) fulfill 

gj = * Iej and fi(Ej) = /x(£> a) ) (23) 

E'\Ej) 

for every j e { 1 , . . . , r } and every n e N. 

Example 21 For every absolutely continuous copula A with density k\ fulfilling 
k A < M the corresponding Markov operator is quasi-constrictive. This directly 
follows from the fact that 

Ta/(x)= f f(y)K A (x, dy) = f f(y)k A (x,y)dy<M 

[ 0 , 1 ] [ 0 , 1 ] 

holds for every / e 0([0, 1]) := 0([0, 1], 3§([0, 1]), X). 

Example 22 There are absolutely continuous copulas A whose corresponding 
Markov operator is not quasi-constrictive — one example is the idempotent ordinal- 
sum-like copula O with unbounded density ko defined by 

00 

ko( x , y ) : = 2 n l[ 1 _2 i -n ? i_2 -n)(x, y ) 

n = 1 


for all v, y e [0, 1] (straightforward to verify). 

Before returning to the copula setting we prove a first proposition to the spec- 
tral decomposition that holds for general Markov operators on L l (£2, gf , /x) with 
(£2, g/, /x) being a probability space. 

Lemma 23 Suppose that (£2 , g/, /x) is a probability space and that T : L 1 (£2, g/, /x) 
— > L l (£2, g/, /x) is a quasi-constrictive Markov operator. Then there exists r > 1, 
a measurable partition ( Ei) r i=l of £2 in sets with positive measure, densities 
h[, ... ,h' r e L°°(£?, g/, /x) fl £&(£2, g/, /x) and a permutation o of{ 1, . . . , r} such 
that we have = 1 as well as 

T n f(x ) = ±(I fhjdiMjlE a „^(x) + Rnf(x ) with lim°\\R n f\\i = 0 (24) 

i=1 Q 
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for every f e L l (£2, srf , fi) and every n eN. 
Proof Using (22) and (23) it follows that 


\\h II 

Rn ifl W = i - T n i a (x) = X i W*) - X 


i=l 
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for every x g £2, which implies 


0= lim Pnltflli 

n—i>OQ 



Pi 111 
V(Ei) 


li(E a «(i } ) = ^ 1 

Z = 1 


II A,~ 111 

MPi) 


KEi). 


Since ix(Ei) > 0 for every i e {l,...,r} this shows that h' t := fjrj € 
L°°(£2, /x) fl @(£ 2 , /x) for every i g {1, . . . , r}. Furthermore we have 

lim^oo || R n h' i || i = 0 for every fixed i, from which 


1 = J T n h' i (x)d^(x) = n hm /Z(/ /»/(z)^(z)^(z))lB ff , 0) (*)rf/i(x) 

Q £2 J = l £2 

= ^ ( J h , i (z)h' j (z)dii(z)jix(Ej) 
j =1 Q 


follows. Multiplying both sides with /x(£)), summing over i g {1, . . . , r} yields 


r r r 

1 = / 


: =g(z ) 

so g G fi) and at the same time g 2 G ^(^2, fi). Using Cauchy Schwarz 

inequality it follows that g(x) = 1 for fi- almost every x e £2. □ 

Lemma 24 Suppose that A is a copula whose corresponding Markov operator T& 
is quasi- constrictive. Then there exists r > 1, a measurable partition ( Ei) r i=1 of 
[0, 1] in sets with positive measure, and pairwise different densities h\, ... ,h r G 
L°°([0, 1]) fl ^([0, 1]) such that the limit copula A ofs m (A) is absolutely contin- 
uous with density k^, defined by 


k^(x, y ) = »,-(y)l £< (*) 

/=! 


( 25 ) 


Some Consequences of the Markov Kernel Perspective of Copulas 


405 


for all x, y e [0, 1]. 

Proof Fix an arbitrary / e /^([O, 1]). Then, using Lemma 23, we have 


1 

n 


j = i 


ifix) 


j-l I-1[0 1] j - 1 

r /. 1 « « 

/ /«-ZC; ( /)(^Vz) + -Z^'/W 


:=gi(z) 


for every x e [0, 1] and every n e N. Since a is a permutation j — >► h^_j^(z) is 
periodic for every z and every /, so, for every i e {1, . . . , r}, there exists a function 
hi such that 


lim -Z K-J(nV> = h iV> 

n—^oo yi ° vU 

7 = 1 

for every z G [0, 1] and every i e {1, . . . , r}. Obviously hi e L°°([0, 1]) and, using 
Lebesgue’s theorem on dominated convergence, hi is also a density, so we have 
h \, . . . , h r e L°°([0, 1]) Pi @([0, 1]). Finally, using Theorem 19 and the fact that 
lim^oo \\R n f\\i =0 for every / e /^([O, 1]), it follows immediately that 

T A f(x)= f f(y)Y j h i (y)l Ei (x)dX(y). 

[ 0 , 1 ] i=1 

This completes the proof since mutually different densities can easily be achieved 
by building unions from elements in the partition (Ei) r i=1 if necessary. □ 

Using the fact that A is idempotent we get the following stronger result: 

Lemma 25 The density of A in Lemma24 has the form 

r 

k A (x,y) = Z m iJ 1 E i xE j (x,y), 

*.7=i 

i.e., it is constant on all rectangles Ei x Ej. 

Proof According to Theorem 19 the copula A is idempotent so A is symmetric. 
Consequently the set 

4 := {(X, y) e [0, l] 2 : k A (x, y) = k A (y, x)} e #([0, l] 2 ) 
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has full measure k 2 (A) = 1. Using Lemma 24 we have 


Z = 1 Z = 1 

for every (x, y) e A. Fix arbitrary i, j e {1 , . . . , r}. Then we can find x e Ei such 
that k(A x ) = 1 holds, whereby A x = {y e [0, 1] : (x, y) e A}. For such x we have 
hi (y) = hj (x) for ^-almost every y e Ej, which firstly implies that hj is, up to a set 
of measure zero, constant on Ej and, secondly, that k ^ is constant on Ei x Ej outside 
a set of A. 2 -measure zero. Since we may modify the density on a set of A. 2 -measure 
zero we can assume that k ^ is of the desired form 

r 

k^(x,y) = y. /n,-jlg. x £,•(*, y), 

U = 1 

with M = (mijYi j—i b e i n S a non-negative, symmetric matrix fulfilling 

(a) T[j=i>mj^(EiMEj) = l 

(b) y=i mijk(Ej) = 1 for every i e {1, . . . , r) 

(c) X;=i = 1 for every j e {1, . . . , r} 

(d) \iiijj — mi l | > 0 whenever j ^ Z. □ 

Before proceeding with the final result it is convenient to take a look at the matrix 
H = (HijYi j—i defined by 

Hij := mijX(Ej) = J hi(z)dX(z ) (26) 

Ej 

for all i, j e {1, . . . , r}. According to (a) in the proof of Lemma 25 H is stochastic. 
Furthermore, idempotence of A and Remark 16 imply = k^, hence 


’y' J hi(y)l Ei (.x) = k^(x,y) = k ^ *k^(x, y) 
i = 1 



./=! 


= y Ie,- (x)hj(y) / hi(z)dX(z) = ^ l £j (x)h j (y)H ij . 
kj= l Ej i 'J= 1 


From this is follows immediately that /z* (y) = Xj=i Hi,jhj (y) is fulfilled for every 
y e [0, 1] and i e {1, . . . , r}, so, integrating both sides over Ei , we have //* / = 
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HijHjj , which shows that H is idempotent. Having this, the proof of the 
following main result of this section will be straightforward. 

Theorem 26 Suppose that A is a copula whose corresponding Markov operator Ta 
is quasi- constrictive. Then there exist r > 1 and a measurable partition ( Ei) r i= i 
of [0, 1] in sets with positive measure, such that the limit copula A of s* n ( A) is 
absolutely continuous with density k ^ given by 

r i 

k A (x,y) = ^ Ui x Ei (x , y) (27) 


for all x, y £ [0, 1]. In other words, the limit copula A has an ordinal- sum- of- 17 -like 
structure. 


Proof Since H is an idempotent stochastic matrix and since H can not have any 
column consisting purely of zeros, up to a permutation, H must have the form (see 

[5, 21]). 


Qi 0 ... 0 \ 
o q 2 . . . o 


(28) 


o 0 ...Q s j 


whereby each Qi is a strictly positive r, x r, -matrix with identical rows and .S' is 
the range of H. We will show that r, = 1 for every i € {1, . . . , s}. Suppose, on the 
contrary, that r; > 2 for some l. Then there would be indices /; := {i i, , i r ,} ( ~ 


form 


Qi 


. , a n £ (0, 1 Y l with a i = 1 suc h that Qi 

/ a\ a 2 ... a n \ 


/ Hi\,h Hiuh • 

■ H ^n\ 

a\ a 2 . . . u ri 

= 

H-hli ^hih • 

■ • HhJq 

\a\ a 2 . . . a n ) 


\Hirph H ir v il • 

■ * / 


(29) 


It follows immediately that 
ff/l J v = V^7y) 


H: 


ly 


' m i2 ' iv X( E i v ) 


H 


lrj ilv 


m t iv \(E iv ), 


so mijj v = mi l A v for every j £ {1, . . . , r/} and arbitrary v £ {1, . . . , r/}. Having 
this symmetry of M implies that all entries of Qi are identical, which contradicts the 
fact that the conditional densities are not identical, i.e., the fact that 


X K.h - mi,h I = X l OT M “ m j>h\ > 0 

jdi j = l 
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whenever i\ / z - 2- Consequently r; = 1 for every i e {1, . . . , s} and has the 
desired form. □ 

Remark 27 Consider again the transformation matrix x from Example 12. Then 
r T ‘(/7), r T 2 (/7), . . . are examples of the ordinal-sum-of-77-like copulas mentioned 
in the last theorem. 
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Copula Representations for Invariant 
Dependence Functions 


Jayme Pinto and Nikolai Kolev 


Abstract Our main goal is to characterize in terms of copulas the linear Sibuya 
bivariate lack of memory property recently introduced in [12]. As a particular case, 
one can obtain nonaging copulas considered in the literature. 


1 Introduction and Preliminaries 

Let Xf be non-negative continuous random variables with survival functions Sx t (*;) 
= P{Xi > x{) and densities fx t (xi ) , i = 1,2. Denote by S(x i,X2) = P(X i > 
x\, X 2 > X 2 ), the joint survival function of the random vector (X\, X 2 ). Following 
[13], any bivariate survival function can be decomposed as a product of marginal 
survival functions and a dependence function Q(x 1 , * 2 ) via 

S(x \,X2) = Sx 1 (xi)Sx 2 (x2)&(x UX2) for all x\,X2 > 0. (1) 

The function fi(x\, X 2 ) represents the free-of-margin influence contribution to the 
genuine dependence advocated by S(x 1, X2). A family of Sibuya copulas is intro- 
duced in [6], where the authors are motivated by a particular dynamic default model. 
Our analysis is based on the following relation 


S(x\ +t,X2 + t) = S(x\,X2)S(t, t)B(x\, X2\ t), t> 0 (2) 

where B(xi,X2',t) is an appropriate “aging” function satisfying the boundary 
conditions B(x i,JC2;0) = 5(0, 0; f) = 1. In fact, incorporating a time 

component in the arguments, we replace the product of marginal survival functions 
in (1) by the product of joint survival functions with nonoverlapping arguments. 
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In the simplest case, when B(x \ , X 2 \ t) = 1 in (2), one gets the functional equation 

S(x i +t,X2 + t) = S(x i, X2 )S(t, t) (3) 

for all x\ , X 2 >0 and t > 0. Bivariate continuous distributions satisfying (3) possess 
the classical bivariate lack of memory property (BLMP). 

The only solution of (3) with exponential marginals is the Marshall-Olkin bivari- 
ate exponential distribution introduced in [9]. However, there do exist distributions 
having BLMP with nonexponential marginals. Various solutions of functional equa- 
tion (3) are presented in [7] where the marginals may have any kind of failure rates: 
increasing, decreasing, bathtub, etc. It is well-known that BLMP preserves the dis- 
tribution of (Xu X 2 ) and its residual lifetime vector 

X, = (Xu, *2f) = l(Xi -t,x 2 -t) \Xi> t, X 2 > t] 

independent of t >0, i.e., (X\, X 2 ) = X t implying Xi = Xu, i — 1,2 for all 
t > 0. 

Remark 1 The vectors (X\, X 2 ) and X t should necessarily have the same survival 
copula, which is unique under continuity of Xi, i = 1, 2. Therefore, BLMP implies 
that the corresponding survival copulas are time invariant (nonaging). 

Thejoint survival function of X t is given by Sx t (xi, * 2 ) = S(x\+t, X 2 +t)/S(t, t). 
Its marginal survival functions are S Xh (x\) = S(x 1 + t, t)/S(t, t) and Sx 2t ( x 2 ) = 
S(t, X 2 + t)/S(t, t). Applying the Sibuya form representation (1) with respect to the 
residual lifetime vector X t we have 

Sx t (xi,X 2 ) = Sx u (xi)Sx 7j (X2)&t(xuX2), (4) 

where Q t {x 1 ,^ 2 ) is the dependence function of X t . 

We will consider a class of continuous bivariate distributions preserving Q t (x 1 , X 2 ) 
independent of t >0, i.e., imposing condition fi t (x 1 ,^ 2 ) = £2(x 1 ,^ 2 ), where 
£2(x 1 , X 2 ) is the dependence function of (X\ , X 2 ) from (1). Such a class with mem- 
oryless dependence function has been recently introduced in [12] as follows. 

Definition 1 The nonnegative continuous bivariate distribution (X\, X 2 ) possesses 
linear Sibuya BLMP (to be abbreviated LS-BLMP) if 

Sx t (x\,X2) _ S(x 1,X2) 

Sxu (x 1 ) Sx 2t (X2) S Xl (x 1 ) Sx 2 (X2) 

for all x \ , X 2 , t >0 and 


Sx it (xi) = S Xi ( x i)exp{-aiXit} for a t >0, i = 1, 2. 


( 6 ) 
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Observe that BLMP distributions satisfy ( 5 ). This means that the class of bivariate 
continuous distributions with LS-BLMP includes those possessing BLMP. 

Let us assume that the partial derivatives of S(x i, X2 ) exist and are continuous. 
Denote by 77 (xi , X2) = — 9 In S(xi , X 2 )/ 9 x/ the conditional failure rates, i = 1 , 2 . In 
[ 12 ] it is introduced a class Jjf (x; a) of nonnegative bivariate continuous distributions 
that satisfy the relation 

r(x\, X2) = n(xi, *2) + r iix\, *2) = a 0 + 01*1 + a 2*2 for ao,a\,a 2>0 ( 7 ) 

for all x\,X2 > 0 , where x = (xi , X2) and a = (<20, a \ , 02) is the parameter vector. 

When the survival function S(x 1,^2) is differentiable, the sum n(xi,X2) + 
r2(vi , X2) has the following interpretation in terms of directional derivatives: it estab- 
lishes the performance of — ln[S(xi, X2)] along the lines parallel to {x\ = X2}, i.e., 
with 45 ° inclination. 

Managing a portfolio means observing and controlling its value changes over time 
to achieve a desired outcome. The vector (r\ (x \ , X2) , r2 (xi , X2)) of partial derivatives 
of — ln[S(xi, X2)] is its gradient. With the gradient at hand, the risk manager can 
evaluate the incremental impact of changes to the portfolio. 

The Marshall-Olkin bivariate exponential distribution is a widely used model 
in risk management and possesses BLMP, see Chap. 3 in [ 8 ]. The class Jzf(x; a) 
transforms into BLMP when a\ = <22 = 0 in ( 7 ) and r\ (x \ , X2) + ^2(^1 , X2) = a 0 - 

The sum in ( 7 ) may serve as a complementary risk measure. For example, the 
portfolio can be considered “risky” if r\(x\, X2) + r2(xi, X2) > ao + a\x\ + <22^2, 
where parameters ao, a\ and a2 are preliminary fixed by an expert. 

The joint survival function corresponding to ( 7 ) is given by 


Remark 2 The joint survival function S(x 1 , X2) in the previous expression is proper 
only for certain marginals Sx^x 1 ) and Sx 2 (*2) • Their choice will determine the range 
of possible values for the non-negative parameters ao, a\ and <22, see Theorem 5 . 2. 14 
and Proposition 5 . 2. 17 in [ 12 ] . The nonnegative parameter ao plays an important role 
in the class j£?(x; a). If ao = fx x ( 0 ) + fx 2 ( 0 )> the joint survival function S(x 1, X2) 
is absolutely continuous and if ao < fx 1 ( 0 ) + fx 2 (Q), the distribution exhibits a 
singular component. 

It happens that the class j£?(x; a) specified by ( 7 ) can be characterized by the 
LS-BLMP defined by ( 5 ) and ( 6 ). The class j£?(x; a) contains continuous bivari- 
ate distributions that are symmetric or asymmetric, positive quadrant dependent or 
negative quadrant dependent, absolutely continuous or exhibit a singular compo- 
nent. In addition, j£?(x; a) can be equivalently represented by relation ( 2 ) when 
B(x 1, X2\ t) = exp{— a\x\t — <22x2^}, i.e., by 


S(x i,x 2 ) = 


Sx 1 (xi — X2) exp {—<20X2 — a\x\X2 
Sx 2 (%2 — Xl) exp {— (2QXl — (22X1X2 



x|} , if xi > X2 > 0 ; 


x^} , if X2 > xi > 0 . 
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S(x\+t,X 2 + t) 

— — = S(x\, X 2 ) Qxp{-a\xit -a 2 X 2 t}. (8) 

\t , t ) 

In Sect. 2, we will characterize the class Jjf (x; a) (or equivalently LS-BLMP) in 
copula terms using the functional equation (8) as base. Recall that the time invariance 
(nonaging) phenomena of the dependence function fi(x\,X 2 ) concerns the preserva- 
tion of the dependence function fi t (x i , xf) given in Sibuya form (4). This justifies our 
suggestion to the corresponding copula be named “Sibuya-type copula.” In Sect. 3, 
we discuss bivariate survival functions with nonaging survival copulas and obtain 
known relations as particular cases of our findings. 


2 Copula Representations of the Class 2£f (x; a) 

Let the vector (X\, X2) be a member of the class J if(x; a). Hence, the survival 
function of the corresponding residual lifetime vector X t is given by (8). Denote 
by C and C t , the survival copulas of (X\, X2) and X t , respectively. First, we will 
find a relation between the survival copulas C and C t . As a second step, we will 
obtain a characterizing functional equation for the survival copula C t that joins the 
corresponding marginals in both sides of (8). 

Theorem 1 Let ( X \ , X2) belong to the class Jjf(x; a). The survival copulas of X t 
and (X\, X2) are connected by 

C r (M,v)=c(exp{— lnw))},exp{— tf 2 (G2/(— lnv))}) 

x exp{— a\tGf t l (— In u) — a2tGf t 1 (— lnv)}, (9) 

where u, v e (0, 1], H((xi ) = — ln[Sxi fe)] and Gn(xi) = Hi(xi) + aiXit, i = 1,2. 

Proof The marginals of X t have survival functions specified by (6). Using Sklar’s 
theorem, relation (8) can be rewritten in terms of the survival copulas C t and C as 
follows 


Ct (S Xl (*i) exp{— flixi? }, Sx 2 (x 2 ) exp{-a 2 x 2 t}) 

= C(Sxi(xi), Sx 2 (x 2 )) exp{— < 2 |X|f -a 2 x 2 t}. (10) 

Let u = Sxi (xi) exp{— a\x\t} and v = Sx 2 ( x 2) exp{— a 2 X 2 t}. From the relations 
Sxi (xt) = exj){-Hi(xi)} and G it {xi) = Hi(xi ) + a^t, i = 1,2, we get x\ = 
G^ 1 (— In u) and X 2 = Gf t l (— In v). Using these Eqs. in (10) we obtain (9). □ 

Relation (9) shows that the survival copulas of (X\ , X 2 ) and X t do not coincide in 
general. The time invariance (nonaging) in the class Jjf (x; a) (being equivalent to LS- 
BLMP) is related to the memoryless dependence function Q t of the residual lifetime 
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vector X t , see relation (5). For comparison only, recall that the time invariance for 
BLMP distributions is concerned with the joint distribution of X/ . 

Substituting a\ = <22 = 0 in (9), we get C t (u, v) = C(u , v) for all t > 0, i.e., the 
survival copula C t is time invariant, see Remark 1. The conclusion is same if X\ and 
X 2 are independent, i.e., C(u, v) = uv. Thus, we have the following result. 

Corollary 1 Under conditions of Theorem 1 if 

(i) a\ = a 2 = 0 or 

(ii) X\ is independent of X 2 , 

then C t (u,v) = C(u, v) for all u, v e (0, 1] and t > 0. 

The next example illustrates the relations established. 

Example 1 Let the vector (X\, X 2 ) belong to j£?(x; a). Suppose that the marginals 
are exponentially distributed, i.e., Sx t CO = exp{— A.* > 0, i = 1, 2. There- 
fore, Git(x) = kfX + aiXt and G^ 1 (u) = u/(kt + ait), i = 1,2. From (9) we 
obtain 


C t (u, v) = C 



k\ In u 
Au T a\t 


, exp 


A.2lnv 1 \ [ a\t\nu 

A. 2 + ci2t J / 1 k\ + a\t 


+ 


a 2 t In v 
A. 2 + a2t 


which can be simplified to 


C t (u,v) = C(u^ +a ^ , V A 2+«2 ? v A 2+«2 ? . 


(ID 


Relation (11) gives a general expression for the survival copula C t (u, v) corre- 
sponding to X t for all members of the class j£?(x; a) with exponential marginals. 

Assume further that (X\, X 2 ) follows Gumbel’s type I exponential distribution 
with survival function 


S(x 1, X 2 ) = exp{— k\x\ — A2V2 — 0k\k2X\X2}, 0 g [0, 1], k\, A2 > 0, 

see [5]. This distribution is a member of the class Jjf(x; a) and the constants in (7) 
are specified by ao = k\ + A2 and a\ = <22 = 0A1A2. The corresponding survival 
copula is C(u,v) = wvexp{— 0 In u In v}. Substituting C(u,v) in (11) we obtain 
C t (u,v) = wvexp{— 0 In u In v/[(l + 0k2t)(l + Ok\t)]}. Therefore, the survival 
copula C t (u, v) depends on t as well. 

When t = 0 in (11) we recover the survival copula C(u, v) of (X\, X 2 ) and 
letting t — >► 00, we obtain the independence copula Coq(u, v) = uv. Notice that the 
independence of X\ and X 2 is equivalent to the condition a\ = <22 = 0. 

Now, our interest is to find a characterizing functional equation involving the 
survival copula C t of X t for the absolutely continuous members of the class j£?(x; a). 

Theorem 2 Let the survival copula C t of X t be differentiable in its arguments. The 
absolutely continuous random vector (X\, X 2 ) belongs to the class Jjf(x; a), if and 
only if there exist non-negative constants a\ and a 2 , such that 
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Ct f) - ) = C,(S Xl (xi)exp{-a l xit}, Sx 2 (x 2 )exp{-a 2 X 2 t}), 

( 12 ) 

for all x\, X 2 , t > 0. 

Proof Let us assume that the functional equation (12) is satisfied. We will show 
that (7) is fulfilled. Taking the derivative in both sides of (12) with respect to t we 
obtain 


^1 / S(x\+t,t) £(7 >*2+0 5 [S i (xi+t,t)-\-S 2 (xi-\-t,t)]S(t,t)— S(x\+t,t)[S l (t,t)+S 2 (t,t)] 

c t \ S(t,t ) ’ S(t, t) ) [S(t,t)] 2 

I (-*2 ( S(x\+t,t) S(t,X 2 +t)\ [S 1 (t,x 2 +t)+S 2 (t,x 2 +t)]S(t,t)—S(t,x 2 +t)[S l (t,t)+S 2 (t,t)] 

l S(t,t) ’ $(*,*) ; [5d,0] 2 

= c] (S Xl (xi)exp{-aixit}, Sx 2 (*2) exp{-a 2 x 2 t}) (-aixiSxj (*i) exp{-aixit}) 
+C 2 (^Xi^Oexpj-aixit}, 5’x 2 fe)exp{-(22V2t}) (-a2*2Sx 2 (*2) exp{-<22X2t}), 


where the superscripts 1 and 2 denote the partial derivatives with respect to the first 
and second arguments of the corresponding functions. Letting x\ = 0 in the last 
equation we have 

^2 ( i S(t,X2~\~t~) \ (t ,x 2 ~\~t')-\-S 2 (t ,x 2 ~\~t')]S(t ,t^) — S (t ,x 2 ~\~t')[S^ (^t S 2 (J 

c t v ’ s(t,t) ) [S(t,t )] 2 

= c} (l, Sx 2 (X 2 ) SXp{-a 2 X 2 t}) (-C 12 X 2 S Xl (X 2 ) ex p ( -a 2 *2 f I ) • 

When Xi = 0 in (12) we get relations (6) in Definition 1, i = 1,2 and therefore 

[S l (t,x 2 + t) + S 2 (t,x 2 + t)]S(t,t)-S(t,x 2 +t)[S l (t,t) + S 2 (t,t)] 

2 = -a 2 x 2 Sx 2 (x2) exp{-a 2 x 2 t}. 

Since r(t,x 2 + t) = — [S l (t,X 2 + t) + S 2 (t,X 2 + f )]/£(*> *2 + 0 and 
r(L r) = [^(L 0 + S 2 (t, t)]/S(t, t) we get 

S(t , V2 + 0 

5^(77) — + ^ — r(r, 0] = -fl 2 ^ 2 *Sz 2 (^ 2 )exp{— a 2 x 2 t}, 

which is equivalent to 

r(t, X 2 + t) = r(t, t ) + a2X2. (13) 


Analogously we obtain the equation 

r(vi + t, t) = r(t, t) + a\X\. 


(14) 


Now, we will represent r(t, t) as a function of ao, a \ , <22 and t. Taking the partial 
derivative of (12) with respect to x\ we have 


7 


S(x\+t,t) S(t,x 2 +t ) 






) ^ ^S(/ + 0 ^ ^ 2 (^ 2 ) exp {-d 2 X2t}) 


x (-/xi (*1) exp{-< 3 ixi?} - (*1) exp{-flixi?}). 
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Applying (6) in the last equation we obtain 
^(xi + 1 , t) 


S(t, t) 


-fxi Oi) exp{—aixit] - aitSxi (*i) mp{-a\x\t} 


and putting x\ = 0 we have r\(t, t ) = fx j(0) + a\t. Similarly we get r 2 (t, t) = 
fx 2 (0) + a 2 t. The sum of last two equations gives 

KM) = ri(t,t) + r 2 (t,t) = [/xi(0) + /x 2 (0)] + a\t + a 2 t. 

Let t = 0 in last relation to get fx x (0) + /x 2 (0) = ao > 0. Thus, 

r(t, t) = ao + a\t + a 2 t. 


Taking into account (13) and (14), we conclude that r(xi, x 2 ) = ao + a\x\ + a 2 x 2 . 
Therefore, we obtain the relation (7) which defines the class j£?(x; a). In addition, the 
corresponding bivariate distributions are absolutely continuous because of equation 
fx i (0) + /x 2 (0) = < 20 , see Remark 2. 

Conversely, assume that the random vector (X\,X 2 ) belonging to the class 
j£?(x; a) is absolutely continuous. Therefore (8), being equivalent to (5) and (6), 
is valid. In addition, relations (6) show that the marginal distributions in both sides 
of (8) coincide. Applying Sklar’s theorem to (8), we obtain the functional equa- 
tion (12). □ 

Since the dependence function Q t satisfies the Sibuya form (4), we refer to the 
survival copula C t characterized by functional equation (12) as Sibuya-type copula. 

Example 2 Let us consider the absolutely continuous joint survival function 


S(x \,x 2 ) 


exp j— [MX, + X 2 x 2 + X\X 2 x 2 {0 \x\ + *2)]! » if x\ > x 2 > 0; 

exp j— [MX, + X 2 x 2 + X\X 2 x\{0 2 x 2 + 2 ^ 2 x l)]| » if x 2 > x\ > 0, 


where 0/ e (0,1], and A 4 > 0, i = 1,2. This distribution was obtained in [12] 
and can be named Generalized GumbeVs bivariate exponential distribution with 
parameters Xj and 0 /, i = 1,2. If 0 1 = 0 2 = 0, we get the Gumbel distribution 
considered in Example 1 . The marginal survival functions are Sx t (x/ ) = exp{ — M x; } , 
i = l,2. 

The survival function of the residual lifetime vector X t is given by (8). After some 
algebra, we get the corresponding survival copula 
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C t (u, v) 


MV e X P {-^^ ln M ln v} e Xp{-^^(l n v ) 2 , 

if u~ Xin ^ > v~ XlY2 ^\ 
if u~ Xin ^ < v~ XlY2 ^\ 


where y\(t) = 1 + Ai#2L 72 (0 = 1 + ^ 2^1 t and u, v e (0, 1]. Fix at = X 1 X 2 Oi, 
i = 1, 2 in (12) to verify that 


C ? (exp{— X\x\ — Xi^Oixit}, exp{— I 2 X 2 — AiA.202^2^}) 


S (x\ -\- 1, x 2 + t ) 
S(t, t) 


for all t > 0. Therefore, the generalized Gumbel’s bivariate exponential distribution 
is member of the class j£f(x; a). 


3 Bivariate Survival Functions with Nonaging 
Survival Copulas 


In this section, we will consider nonaging survival copulas C (u , v) instead of 
memoryless dependence functions Q t {x 1 ,^ 2 ). 

Let us denote by si the class of continuous bivariate survival functions S(x 1 , * 2 ), 
such that (Xi, X 2 ) and X t have the same survival copula C(u,v). Therefore, the 
functional equation 


C(Sxi(xi + t),Sx 2 (x 2 + 0 ) _ C{Sx l (x\ + t), Sx 2 (t)) C(Sx l (t), Sx 2 (x 2 + 0 )\ 
C(S Xl (t)>S X2 (t)) “ \ C(S Xl (t),S X2 (t)) ' C(S Xl (t),S X2 (t)) ) { 


has to be satisfied for all x\, X 2 > 0 and t > 0. We will assume further that the 
survival copula C is time invariant (or nonaging) if it corresponds to a member of 
the class si. 

Taking into account the conclusion in Remark 1, all bivariate survival functions 
possessing BLMP belong to si. It happens that this time invariance property is not 
restricted to BLMP survival functions. For instance, it is well-known that the Clayton 
bivariate survival function given by 


*S(*1,*2) 



(vi) + S x ° 2 



0 e (0, 00 ), 


has time invariant survival copula. One can find other members of the class si in 
Examples 3 and 4. 
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Let $>(t) = {( u , v) 6 (0, 1] | u = v = Sx 2 (t ), £ > 0} be a curve on 

the unit square parameterized by t > 0. In such a case, from (15) we may obtain 
nonaging survival copulas whenever C is invariant on the curve In particular, 

if X\ = X 2 , we have invariance of the survival copula along the main diagonal of the 
unit square. 

Example 3 [Invariance on the main diagonal] The Cuadras-Auge survival copula 

C a (u, v) = [min(Mv)]“ [mv] 1_ “, a e [0, 1] 

is invariant on the main diagonal of the unit square, see [2]. Let us initially consider 
equally distributed marginals Sxi (x) = Sx 2 (x) = Sx(x). If Sx(x) is exponen- 
tially distributed, then S(x 1 , X 2 ) = C a (Sx 1 (x\), Sx 2 ( x 2)) is a particular case of the 
Marshall-Olkin’s bivariate exponential distribution, see [9], possessing BLMP and, 
consequently, belonging to the class srf . Now, let X be gamma distributed random 
variable. In this case, BLMP does not hold true but the corresponding joint survival 
function still belongs to s/. 

In a third scenario, where X\ and X 2 do not share the same distribution but are 
joined by the Cuadras-Auge survival copula, S(x 1 , X 2 ) neither possesses BLMP nor 
belongs to s/. 

Example 4 [Invariance along a curve] The Marshall-Olkin survival copula 

C a ,p(u, v) = mm(u l ~ a v, uv l ~P), a, p e (0, 1) 

is invariant on the curve {(m, v) = (t 01 , t&), t e (0, 1)}, see [2]. Notice that when 
a = we obtain the Cuadras-Auge survival copula from Example 3. 

Let us consider a baseline survival function Sx(x) and substitute Sxi(x) = 
[Sx (-r)]“ and Sx 2 (x) = [Sx (x)l^ • Then, the corresponding joint survival function 
S(x 1 , X 2 ) — C a ,p (Sxi (x\), Sx 2 (X 2 )) belongs to si. In particular, if the marginals are 
exponentially distributed, not necessarily sharing the same parameter, then S(x 1 , X 2 ) 
possesses BLMP. But choosing X\ exponentially distributed and X 2 beta distributed, 
say the corresponding joint survival function is not a member of the class si. 

The cases considered in the last two examples depend on the choice of the marginal 
survival functions. A general invariance property can be obtained when we consider 
the Clayton survival copula. In such a case, for any marginals we have time invariant 
survival copulas. We refer the reader to Sect. 4 in [2] for more details on time invariant 
copulas. 

In fact, the Clayton survival copula is the only absolutely continuous copula that 
is preserved even under bivariate truncation, see [11]. The absolutely continuous 
assumption is relaxed in Theorem 4.1 in [3]. In [10], it is given a characterization 
of the survival functions which simultaneously have Clayton survival copula and 
possess BLMP, see their Theorem 3.2. 

In the next statement, we establish a necessary condition to an absolutely contin- 
uous bivariate survival function be a member of the class si. 
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Theorem 3 Let S(x i, X2) be an absolutely continuous survival function belonging 
to the class sf . Then, its survival copula satisfies the functional equation 


C(u,v) 


/x 2 (0)C' 2 (m, 1)1 




u 

ao 

C l (u, v) + 

v — 

ao 


( 16 ) 


for all u, v € [0, 1] and ao > 0, where C 1 and C 2 denote the partial derivatives of 
C with respect to the first and second arguments, respectively. 

Proof Take the derivative in (15) with respect to t and substitute t = 0 to get (16). 


□ 


The knowledge of the first partial derivatives of the survival copula C(u,v) is 
sufficient to recover the distribution of min (U, V ), where U and V are uniformly 
distributed with survival copula C(u, v). Really, P(min(U, V ) > t) = C(t,t) for 
t £ [0, 1]. Now, substitute u = v = t in (16) to get the corresponding equation (and 
main diagonal copula). 

Finally, we show two known functional equations which are particular cases 
of (16). Under assumptions of Theorem 3, let fx i( 0 ) = fx 2 ( 0 )- Then 


C(u,v) = 


C 2 (u , 1 ) 


C l (u , v) 


cH i.v) 


C z (u, v). 


The same equation is obtained in Proposition 3 (ii) in [1] under the condition that 
X\ and X2 are uniformly distributed on the unit square, i.e., fx x (0) = fx 2 ( 0 ) = 1 - 
Further, assume that C(u,v ) is exchangeable. Thus, C 2 (u, 1) = C l (l,u), 
C 2 (u , v) = C l (v, u) and the last equation transforms into 


C(u,v ) 



C l (h m)" 
2 


C l (u, v) + 



cHhv) 

2 


C l (v, u ), 


see Proposition 3 on page 18 in [4]. 


4 Conclusions 

The time invariance of the residual lifetime vector X t of (X\, X2) is characterized 
by BLMP in [9]. It tells us that the joint distributions of X t and (X\, X2) coincide 
independently of t , i.e., the BLMP holds. In this paper, we consider a more general 
concept, namely time invariance of the dependence functions of X t and (X\, X2), 
given by (4) and (1), respectively. 

We offer copula representations for the time invariance property related to bivariate 
survival functions of the residual lifetime vector X t . While in Sect. 2, the nonaging 
phenomena is associated with the dependence function T 2 t (x 1 ,^ 2 ), in Sect. 3 our 
interest is on the survival copula C t ( u , v) of X t . 

We are thankful to the referee and editor for their comments. 
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Nonparametric Copula Density Estimation 
Using a Petrov-Galerkin Projection 


Dana Uhlig and Roman Unger 


Abstract Nonparametrical copula density estimation is a meaningful tool for 
analyzing the dependence structure of a random vector from given samples. Usually 
kernel estimators or penalized maximum likelihood estimators are considered. We 
propose solving the Volterra integral equation 

U\ Ud 

cOi, . . . , Sd)ds\ • • • dsd — C(m i, ...,Ud) 

o o 

to find the copula density c(u \, . . . , Ud) = °f the given copula C. In the 

statistical framework, the copula C is not available and we replace it by the empirical 
copula of the pseudo samples, which converges to the unobservable copula C for 
large samples. Hence, we can treat the copula density estimation from given samples 
as an inverse problem and consider the instability of the inverse operator, which has 
an important impact if the input data of the operator equation are noisy. The well- 
known curse of high dimensions usually results in huge nonsparse linear equations 
after discretizing the operator equation. We present a Petrov-Galerkin projection for 
the numerical computation of the linear integral equation. A special choice of test 
and ansatz functions leads to a very special structure of the linear equations, such 
that we are able to estimate the copula density also in higher dimensions. 



1 Copula Density Estimation as an Inverse Problem 


A copula is a multivariate distribution function of a d-dimensional random vector with 
uniformly distributed margins. Sklar’s theorem ensures that any joint multivariate 
distribution F of a d-dimensional vector X = (X\ , . . . , Xj) r with margins Fj 
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(j = l, ... ,d) can be expressed as 


Fix i, ...,x d )= C(Fi(xi), . . . , F d {x d )) Vx = (xi, . . . ,x d ) T e 


where the copula is unique on range{F \ ) x • • • x range(Fd ), that is for contin- 
uous margins F \ , . . . , Fd the copula C is unique on the whole domain. Conse- 
quently, the copula contains the complete dependence structure of the random vector 
X. For a detailed introduction to copulas and their properties see, for example, 
[8, 9, Chap. 5] or [10]. In risk management, knowledge of the dependence is of 
paramount importance. 

If the copula is sufficiently smooth, the copula density 


C(M1, ...,Ud) 


d d C 


du \ • • • 3 Ud 


( 1 ) 


exists and then the density gives us the dependence structure in a more convenient 
way, because usually the graphs of the copulas look very similar and there are only 
small differences in the slope. For this reason the reconstruction of the copula density 
is a vibrant field of research in finance and many other scientific fields. Particularly 
in practical tasks, the dependence structure of more than two random variables is 
of special interest as the dimension d is large. In the nonparametric statistical esti- 
mation, usually kernel estimators are used, but they have often problems with the 
boundary bias. There are also spline- or wavelet-based approximation methods, but 
most of them are only discussed in the two-dimensional case. Likewise, in [12], 
the authors discuss a penalized nonparametrical maximum likelihood method in the 
two-dimensional case. A detailed survey of literature about nonparametrical copula 
density estimation can be found in [6]. However, most of the nonparametrical meth- 
ods are faced with the curse of dimensionality such that the numerical computations 
are only for sufficiently low dimensions possible. Actually, many authors discuss 
only the two-dimensional case in non-parametrical copula density estimation. 

In this paper we develop an alternative approach based on the theory of inverse 
problems. The copula density (1) exists only for absolutely continuous copulas. 
Obviously, the copula is not observable for a sample Xi , X 2 , . . . , X 7 in the statistical 
framework, but we can approximate it with the empirical copula 


^ T j t d 

_ j ^ {u j <u} _ j n ^{Ukj<uk} 

7=1 7=1 *=1 


( 2 ) 


of the margin transformed pseudo samples Ui, U 2 , . . . , Ur with Ukj = F^iX^j ) 
where 
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denotes the empirical margins. It is well-known that the empirical copula uniformly 
converges to the copula (see [2]) 


max 

ue[0,lp 


C(u) - C(u) 


Y (log log T) 2 

V H 


> 


a.s. for T 


oo 


( 3 ) 


Therefore, we treat the empirical copula as a noisy representation of the unobservable 
copula C 8 = C. The estimation problem of the density is faced with differentiating 
the empirical copula, which is obviously not smooth. However, for each density it 
yields the integral equation 


U\ u d 

J ••• J cC Si, • • -,Sd)dsi • • d s d = C(u\, Vu = (u \ % . . . , u d ) T £ Q = [0, l] d 

0 0 

( 4 ) 

which can be seen as a weak formulation of Eq. (1). In the following, we therefore 
consider the linear Volterra integral operator A £ 2 (L l (£2), L 2 (£?)) and solve the 
linear operator equation 

Ac = C (5) 


to find the copula density c. In the following, we assume attainability which means 
C £ ^(A), hence we only consider copulas C £ L 2 (£?) which have a solution 
c £ L l (Q) 

The injective Volterra integral operator is well-studied in the inverse problem 
literature. Even in the one-dimensional case, this is an ill-posed operator resulting 
from the noncontinuity of the inverse A -1 , which is the differential operator. Hence, 
solving Eq. (1) leads to numerical instabilities if the right-hand side of (5) has only a 
small data error. Because the solution is sensitive to small data errors, regularization 
methods to overcome the instability are discussed in the inverse problem literature. 
For a detailed introduction to regularization see, for example, [4, 13]. 

In Sect. 2 we discuss a discretization of the integral equation (4) and in Sect. 3, 
we illustrate the numerical instability if we use the empirical copula instead of the 
exact one and discuss regularization methods for the discretized problem. 

The basics to the numerical implementation of the problem and especially the 
details of the Kronecker multiplication are presented in the authors working paper 
[14] and a discussion that the Petrov-Galerkin projection is not a simple counting 
algorithm is done in [15]. This paper gives an summary of the proposed method for 
effective computation of the right-hand side for larger dimensions and discusses in 
more detail the analytical aspects of the inverse problem and reasons for the existence 
of the Kronecker structure. 
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2 Numerical Approximation 


We discuss the numerical computation of the copula density c e X = L l (f2) from 
a given copula C e Y = L 2 (i2), which is in principle a numerical differentiation 
and in higher dimensions, a very hard problem (see [1]). Moreover, in practical 
applications, the measured data C 5 have some noise 8 with ||C — C 8 || y < 8 and very 
often the function is not smooth enough that is C 5 ^ C 1 (i?) even C e C 1 (i?), which 
leads to numerical instabilities making a usual numerical differentiation impossible. 
For the sake of convenience, we write 

u 

J c(s)ds = C(u) Vu = (Ml, . . . , u d ) T e Q = [0, l] d 

0 

for Eq. (4) as a short form. We propose applying a Petrov-Galerkin projection (see 
[5]) for some discretization size h and consider the finite dimensional approximation 

N 

c/iCsQ = , (6) 

j = i 

where 0 = {0i, 02> • • • » 0ivJ is a basis of the ansatz space V/,. The vector of coeffi- 
cients c = (ci, , cn) t e is chosen such that 


u 

J J c/ z (s)ds^(u)du = J C(u)V^(u)du e Vh 

Q o Q 


(7) 


It is sufficient to fulfill Eq. (7) for N linear independent test functions 1//7 g V/z. This 
yields the system of linear equations 


Kc = C 


( 8 ) 


with right-hand side 


Ci = 


/ 


C(u)V'i(u)du. 


i = 1 , 


and the N x N matrix K with 


( 9 ) 


Ki 


U 

-II 


0/(s)dsi/fi(u)du . 


12 0 
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If the exact copula is replaced by the empirical copula, we obtain a noisy repre- 
sentation C 8 with 


C?=y C(u) W „,du, N ,10, 

Q 

of the exact right-hand side C. A typical phenomenon of ill-posed inverse problems 
is that the numerically computed solution based on noisy data (10) will be high 
oszillating without choosing a proper regularization. This problem is not caused by 
the numerical approximation, but rather by the discontinuity of the inverse operator. 
In Section 3 this will be illustrated. Figure 3 shows the reconstructed density of the 
Student copula for exact data (9), whereas Fig. 5 shows it for different noise levels. 

In principle, we can choose arbitrary ansatz functions 4>j e V/* and test functions 
i/ii e Vh . However, having the curse of high dimensions in mind, we choose very 
simple ansatz functions such that the matrix K gets a very special structure allowing 
us to solve (8) and compute the approximated copula density also for higher dimen- 
sional copulas. Obviously, the approximated density (6) is not smooth and in order 
to obtain a smoother approximated copula C h with 

u 

Cft(u) = J c/,(s)ds 

o 

we choose the test functions as integrated ansatz functions, such that the approxi- 
mated copula 


N 

Cft(u) = 

j = 1 

is smoother than the approximated density. 

We discretize the domain £2 by splitting each one-dimensional interval [0, 1] 
in n equal subintervals of length h = Hence, we obtain N = n d equal-sized 
hypercubes and call these elements e \ , . . . , e^. We number the elements in a specific 
order, illustrated in Fig. 1 such that if we look at the (. d + 1) -dimensional problem, 
the first n d elements of the new problem have the same number and location as the 
elements of the d-dimensional problem. 

We set N — n d and choose the ansatz functions 


07 (U) 


1 U eej 

0 otherwise 


( 11 ) 


and the test functions 0/ as the integrated ansatz functions 
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(») (b) 


(n— \)n 

+1 

(n-l)n 

+2 


n 2 





n+ 1 

n + 2 


2 n 

1 

2 


» 



h 


U\ 




Fig. 1 Discretization of the domain X2 = [0, 1]^. a d = 2. b d = 3 


fi (u) 


/ 


0/(s)ds. 


( 12 ) 


In contrast to finite element discretizations, the system matrix K is not sparse and the 
system size N = n d grows exponentially with the dimension d. A straightforward 
assembling and solving of the linear system (8) becomes impossible for usual dis- 
cretizations n. Even in the three-dimensional case, the matrix storage of the system 
matrix for n = 80 needs approximately one terabyte, even when exploiting sym- 
metry, and computing times for assembling and solving such systems will become 
enormous. 

The choices (11) and (12) yield a structure of the N x N system matrix K , 
illustrated in Fig. 2, allowing us to solve (8) also for d > 2. The matrixplot shows 
that the n x n system matrix of the one-dimensional case is equivalent to the upper 
left n x n corners of the two- and three-dimensional matrices. Moreover, the other 
parts of the system matrices are scaled replications of the one-dimensional n x n 
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(a) (b) 


(c) 





Fig. 2 Matrixplots of the system matrix K for n = 4 and different dimensions d. a System matrix 
for d = 1. b System matrix for d = 2. c System matrix for d = 3 

system matrix. This effect is based by a Kronecker factorization of the d-dimensional 
system matrix into d one-dimensional matrices of the one-dimensional problem. 

One important reason for this structure is that the chosen ansatz functions decom- 
posed into a product of one-dimensional ansatz functions. In order to illustrate this, 
we consider the lowest corner b* of the i th element and define the one-dimensional 
function 


€ = i 


{[*>£ 4 +*]} 


This yields 
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d 


M «) = n *?<«*> 


(13) 


*=1 


as well as 


d 


il'i(u) = ]~[ iff (uk) 


(14) 


k = 1 


with the one-dimensional test functions 


u 


fib) = 


/ 


(pf(s)ds. 


o 


We only formulate the main result allowing us to compute solutions of (8) also 
for higher dimensions d. Details and proofs can be found in the working paper [14]. 

Theorem 1 The system matrix for the (< d + 1 )- dimensional case can he extracted 
from the one and d -dimensional system matrices. 


Corollary 1 The system matrix ^ K is the d-fold Kronecker product of the n x n 
matrix ^ K 


and the inverse system matrix of the d -dimensional problem is the d -times Kronecker 
product of the one-dimensional inverse system matrix 


Following Corollary 1 we only have to assemble the one-dimensional system 
matrix of dimension n x n, compute its inverse ^K~ l and have to perform the 
Kronecker factorization for computing the solution c = ^K~ l C of (8). Details of 
the algorithm and an effective Kronecker multiplication are written in [14]. Using 
effective parallelization methods, the running time can be accelerated. Actually, the 
computation of the right-hand side (9) is the crucial part and much more expensive 
than solving the linear system, because we have to evaluate N = n d different d- 
dimensional integrals over the whole domain Q . Note that for our special choice of 
ansatz functions (6) we have 


(d+i) K = Wk ® {d) K 


(d) K = (! ~>K ® (D K ® ® m K 


(15) 


(d) K -i _ (i)^-i (g, (1)^-1 ® ® (Vk~ 1 . 



( 16 ) 
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which also reduces the numerical effort. In higher dimensions, the number of ele- 
ments et with zero values grows, such that using Eq. (16) instead of (9) improves the 
running times. 

In the most practical relevant case, where the components of the right-hand side 
( 10 ) are evaluated over the empirical copula ( 2 ), the numerical effort can be radically 
reduced, because the d-dimensional integral 


cf = /C(u)V f j(u)du = Y X I n 1 {u ki <u k }^H u k) du "fin $ (17) 


Q 


7 = 1 Q k = 1 


7=1 k=\ 


degenerates in a product of d one-dimensional integrals 


ih 


I hu t j<s}^i^ ds = 
0 


h(\ -b[) - i h 2 
h(\-b[)-\h 2 
h(l — Ukj ) , 


Ukj < b[ 

i( u kj -bi ) 2 , bi^Ukj^bi+h 

Ukj > b l k + h 


using Eqs.(13) and (14). In this case, the numerical effort is of order 6 (NT d) 
which is an extreme improvement to G (N3 d T + n2 + n 3 ^^ if the d-dimensional 

integrals (10) are numerically computed by a usual 3 d -points Gauss formula. We want 
to point out that the computation of the right-hand side ( 10 ) for the empirical copula 
based on formula (17) is still possible for d = 9, whereas the computational effort for 
computing (16) for an arbitrary given copula C is exorbitant, even if the discretization 
size n is moderately chosen. The numerical effort is illustrated in Table 1 . 

Note that contrary to what might be expected, the vector c = (c \ , . . . , c^) T does 
not count the number of samples in the elements, even though the approximated 
solution c h is a piecewise constant function on the elements and the Petrov-Galerkin 
projection is not simple counting (for more details see [15]). 


Table 1 Computing times using (16) for the independence copula 


d 

n 

N 

S rhs 

trhs (s) 

trhs using (16) 

t solve (s) 

lie — C h IIl 1 ^) 

2 

30 

900 

1 

0.2 

<1 s 

0.0005 

2.5e — 10 

2 

60 

3,600 

1 

2.2 

<1 s 

0.003 

4.9e - 9 

2 

100 

10,000 

3 

6.7 

3s 

0.01 

2.8e — 8 

3 

30 

27,000 

10 

60.7 

18s 

0.01 

4.8e — 7 

3 

60 

216,000 

30 

1,440 

379 s 

0.13 

3.4e - 5 

3 

100 

1,000,000 

30 

32,163 

8,031s 

1.04 

7.1e — 4 

4 

30 

810,000 

30 

72,989 

10,876 s 

0.29 

1.2e — 3 

5 

30 

24,300,000 

30 


~ 112 days 



6 

30 

729,000,000 

30 


~270 years 
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2.1 Examples 

In order to illustrate the computing times and approximation quality, we use the 
independent copula 


d 

c (u) = n z// 

k = i 


which has the exact solution c (u) = 1. Please note that for this example, we used 
the exact copula as right-hand side without generating samples. So, there is no data 
noise and hence 8 = 0, which allows us to separate the approximation error and the 
ill-posedness resulting from the uncontinuity of the inverse operator C _1 . 

Many authors (see, for example, [11]) look at the integrated square error, which is 
the squared L 2 -norm of the difference between the copula density and its approxima- 
tion. For the independent copula, the integrated square error can easily be computed 


ISE(c,c/0 = ||c 





2 

l 2 ' 


Actually, this error measure is unsuitable, because the natural space for densities is 
L 1 instead of L 2 (see [3]) and so we measure the difference in the Z^-norm, which 
also can be easily computed for the independence copula 


lie “ c h\\L l (tt) 


/ 


|c(u) — Ch (u) | du = 


1 

N 


c-(l,l,...,l) r 


n ' 


In Table 1 , we give the following quantities for different discretization steps n in 
dimension 1 and dimension d : the system size N = n d , the computing times t r h s 
for assembling the right-hand side, t so i ve for solving the system, s r h s as the number 
of computing slaves and the L 1 -approximation errors. For the computation of the 
right-hand side, a parallel OpenMPI implementation was used with s r hs computing 
slaves. For solving the system with the Kronecker factorization, a sequential C++ 
implementation is used. The exact computation of an ordinary right-hand side without 
using the product structure gets still impossible for d >5 and the times are estimated 
computing times. In summary, the example of the independence copula shows that 
for exact data of the right-hand side, the approximation error is suitable but grows 
with decreasing discretization size h = ^ . We want to point out that this is typical 
phenomenon of inverse problems, called “regularization by discretization”. 

If we consider the more practical relevant case, that the empirical copula, gener- 
ated by T independent samples of the independence copula, is used, we are faced 
with data noise 8 > 0 and ill-posedness. Table 2 shows that the computation based 
on (17) is still possible for d ~ 10. However, the approximation error increase with 
the dimension d , which is a direct consequence of the ill-posedness, because the 
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Table 2 Computing times using (17) for T = 100,000 samples 


d 

n 

N 

S rhs 

trhs 

tsolve (s) 

ll c “ C h II L x (£2) 

2 

30 

900 

1 

1.3s 

0.0002 

8.89e - 2 

2 

60 

3,600 

1 

5.3 s 

0.002 

1.76e — 1 

2 

100 

10,000 

3 

5s 

0.014 

2.96e - 1 

3 

30 

27,000 

10 

7.2 s 

0.013 

5.17e — 1 

3 

60 

216,000 

30 

19s 

0.11 

1.45e + 0 

3 

100 

1,000,000 

30 

86 s 

0.95 

2.71e + 0 

4 

30 

810,000 

30 

97 s 

0.25 

2.72e + 0 

5 

30 

24,300,000 

30 

3,607 s 

8.49 

1.82e + 2 

6 

10 

1,000,000 

30 

197 s 

0.14 

3.50e + 0 

7 

10 

10,000,000 

30 

2,371s 

1.68 

3.54e+ 1 

8 

10 

100,000,000 

30 

26,329 s 

18.2 

7.28e + 3 

9 

10 

1,000,000,000 

30 

303,239 s 

253 

9.63e + 5 

10 

10 

10,000,000,000 

30 

~40days 

2,025 




Fig. 4 Frank copula, 9 = 4, n = 50, a reconstructed density c, b copula C 
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condition number of the system matrix K is the condition number of the one- 
dimensional system matrix ^ K to the power of d. 

Naturally, our proposed method works not only for the rather simple independence 
copula, it also works quite well for all typical copula families. The approximation 
error for noise free right-hand sides can be neglected. Figures 3 and 4 show the 
reconstructed densities for the Student and Frank copula, using exact data for the 
right-hand side. However, ill-posedness is expected when empirical copulas are used. 
In [14], numerical results for other copula families, like the Gaussian, Gumbel, 
or Clayton copula, can also be found. However, ill-posedness is expected when 
empirical copulas are used and we are faced with data noise, which we discuss in 
the next section. 


3 Ill-Posedness and Regularization 

Note that in real problems, the copula C is not known and we only have noisy data 
(10) instead of (9). In order to illustrate the expected numerical instabilities, we have 
simulated T samples for each two-dimensional copula and present the nonparametric 
reconstructed densities using the Petrov-Galerkin projection with grid size n = 50. 



Fig. 5 Student copula density, p = 0.5, v = 1, n = 50. a T = 1,000,000. b T = 100,000. c 
T = 10,000. d T = 1,000 
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A typical problem of ill-posed inverse problems is, that the numerical instability 
decreases if the grid size n decreases, which can also be seen in Table 1. Therefore, 
we fix the grid size n = 50 and look at the influence of sample size T . 

Because of (3), the data noise 8 increases if T decreases. Figures 5 and 6 show 
the expected ill-posedness appearing for decreasing sample size T . Of course, this 
instabilities also occur for the other copula families, but we restrict our illustration 
here to these two examples. More examples can be found in [14]. 

To overcome the ill-posedness, an appropriate regularization for the discretized 
problem (8) is required. Figures 7 and 8 show the reconstructed copula densities for 
T = 1,000 and T = 10,000 samples using the well-known Tikhonov regularization. 
There is no regularization, if the regularization parameter a = 0 is chosen. The 
left-hand side of the figures shows the unregularized solutions. The choice of the 
regularization parameter a = 10“ 8 is very naive and arbitrary and serves only as 
demonstration how the instability can be handled. A better parameter choice should 
improve the reconstructed densities. It is further work to discuss an appropriate 
parameter choice rule for Tikhonov regularization as well as other regularization 
methods. 

In order to avoid the complete assembling of the system matrix K leading to 
high-dimensional systems for d > 2, we are interested in regularization methods 
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Tikhonov-Regularisation, alpha=1 .00e-08 

(b) 



Tikhonov-Regularisation, alpha=1 .00e-08 



Fig. 7 Regularized Student copula density, p = 0.5, v = 1, n = 50. a a = 0, T = 1,000 samples. 
ba = 10 -8 , T = 1,000 samples, c a = 0, T = 10,000 samples, d a = 10 -8 , T = 10,000 samples 


using the special structure (15). In particular, all regularization methods based on 
the singular value or eigenvalue decomposition of K can be easily handled because 
the eigenvalue decomposition of the one-dimensional matrix ^ ^ K = V AV T leads 
to the eigenvalue decomposition of the system matrix 

K = (V ® ® V) (A ® ■ ■ ■ ® A) (v T ® ® V 7 ) . 

A typical property of Tikhonov regularization is that true peaks in the density will be 
smoothed. This effect appears in particular for the Student copula density. Hence, the 
reconstruction quality should be improved, if other regularization methods are used. 
In the inverse problem theory, it is well-known that Tikhonov regularization accom- 
panies L 2 -norm penalization of the regularized solutions. Therefore, L 1 penalties or 
total variation penalties (see [7]) seem more suitable. 

Furthermore, the approximated copula 


C*(u) = 


/ 


N 

c/i (s)ds = Cj fj (u) 
j = i 
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Tikhonov-Regularisation, alpha=1 .00e-08 



Tikhonov-Regularisation, alpha=1 .00e-08 



Fig. 8 Regularized Frank copula density, 0 = 4,n = 50. a a = 0, T = 1,000 samples, b a = 10 8 , 
T = 1,000 samples, c a = 0, T = 10,000 samples, d a = 10 -8 , T = 10,000 samples 


should yield the typical properties of copulas. For example, the requirement 
yields the condition 2^=1 c i = ^ an( ^ re q u i rements 

i 

C/ Z (l, . . . , 1, Uk, 1, • • • , 1) == Uk k = 1, . . . , d 

lead to additional conditions on the vector c, which all together can be used to build 
problem specific regularization methods. 


Open Access This chapter is distributed under the terms of the Creative Commons Attribution 
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in 
any medium, provided the original author(s) and source are credited. 
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